This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
3/3
AArch64ISelDAGToDAG.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
arm64-bitfield-extract.ll
-
arm64-non-pow2-ldst.ll
-
arm64-strict-align.ll
-
arm64_32.ll
-
bfis-in-loop.ll
-
bitfield-insert.ll
-
build-pair-isel.ll
-
funnel-shift-rot.ll
-
load-combine-big-endian.ll
-
load-combine.ll
-
logic-shift.ll
-
nontemporal-load.ll
-
rotate-extract.ll
3/3
trunc-to-tbl.ll
-
urem-seteq.ll
-
vec_uaddo.ll
-
vec_umulo.ll

Differential D135102

[AArch64] Compare BFI and ORR with left-shifted operand for OR instruction selection.
ClosedPublic

Authored by mingmingl on Oct 3 2022, 1:52 PM.

Download Raw Diff

Details

Reviewers

dmgreen
efriedma
fhahn

Commits

rGf62d8a1a5044: [AArch64] Compare BFI and ORR with left-shifted operand for OR instruction…

Summary

Before this patch:

For r = or op0, op1, tryBitfieldInsertOpFromOr combines it to BFI when
1. one of the two operands is bit-field-positioning or bit-field-extraction op; and
2. bits from the two operands don't overlap

After this patch:

Right before OR is combined to BFI, evaluates if ORR with left-shifted operand is better.

A motivating example (https://godbolt.org/z/rnMrzs5vn, which is added as a test case in test_orr_not_bfi in CodeGen/AArch64/bitfield-insert.ll)

For IR:

define i64 @test_orr_not_bfxil(i64 %0) {
  %2 = and i64 %0, 127
  %3 = lshr i64 %0, 1
  %4 = and i64 %3, 16256
  %5 = or i64 %4, %2
  ret i64 %5
}

Before:

lsr     x8, x0, #1
and     x8, x8, #0x3f80
bfxil   x8, x0, #0, #7

After:

ubfx x8, x0, #8, #7
and x9, x0, #0x7f
orr x0, x9, x8, lsl #7

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mingmingl created this revision.Oct 3 2022, 1:52 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 3 2022, 1:52 PM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

mingmingl requested review of this revision.Oct 3 2022, 1:52 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 3 2022, 1:52 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B190059: Diff 464803.Oct 3 2022, 1:53 PM

Fix code style issues (nested if, etc)

mingmingl added reviewers: dmgreen, efriedma, fhahn.Oct 4 2022, 12:16 AM

Harbormaster completed remote builds in B190126: Diff 464896.Oct 4 2022, 12:16 AM

mingmingl edited the summary of this revision. (Show Details)Oct 4 2022, 12:16 AM

Actually the current implementation causes indefinite loop for 'test_nouseful_strb' in 'llvm/test/CodeGen/AArch64/bitfield-insert.ll' (check-llvm hangs at 99%). Will make fixes for that. Sorry about it.

An update when trying to figure out the cause of indefinite loop:

How indefinite loop take place (diff as it is shown)
1. By rewriting AND(val, shifted-mask) to shl(and(srl(val,N), mask), N), the patch creates two more SDNode in dag-combiner; the added nodes could easily interact badly with existing combining logic, causing a repeat of {node expansion (as this patch does), node combination (existing logic)} in general. One two-line LLMV IR is attached in [1] to exemplify this.
2. Indefinite loop could be solved by adding these lines [2] (atop current patch), but it's hard to prove rewriting one dag node to three dag nodes (AND(val, shifted-mask) to shl(and(srl(val,N), mask), N)) is not fragile (i.e., interact badly with future combination logic)

The motivating test case (including {5-line C++ =, current codegen, optimal codegen}) is https://godbolt.org/z/h96b1sGco
- The source code of over-eager BFM usage is this in AArch64DAGToDAGISel::Select -> there is no DAG node for BFM instruction -> the BFM selection happens after dag-combiner, and is written in C++ to see through bit-simplification opportunities between two operands
- What bit-simplification opportunities means -> getUsefulBits scans uses of ISD::OR (with a limited recursion depth) to shrink the number of useful bits -> if bits could be proved not useful (by users), usage of BFM eliminates DAG nodes and thereby reducing the number of instructions (code)

What I learnt from 1 and 2

bit-simplification opportunities from BFM should be retained, since in the best cases it eliminates instructions
Instruction selection needs to be enhanced to choose between ORR (with shifted register) and BFM (for the motivating test case in 2); one way to do it, is to introduce SelectionDAG node for 'BFM' instruction and let instructions go through DAG-Combiner for evaluation, and re-write getUsefulBits function (which relies on MachineOpCode (i.e., users of a SDNode already being selected) now) so it could analyze useful bits when SDNodes are not selected yet.

To enhance ISel to choose between 'ORR' and 'BFM', I'm planning on changes to adding SelectionDAG Node for 'BFM' (and probably UBFM since UBFM is helpful to see through useful bits ). Going to make some revisions and send them out in stacked diffs..

[1] https://godbolt.org/z/qexqzYx1W (AND with shifted-mask operand is rewritten by this patch, and combined back by shouldFoldConstantShiftPairToMask to an AND with shifted-mask again, causing indefinite loop

[2]

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index fc1893b6d61d..a2d4fe280134 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -14220,6 +14220,13 @@ bool AArch64TargetLowering::shouldFoldConstantShiftPairToMask(
     return (!C1 || !C2 || C1->getZExtValue() >= C2->getZExtValue());
   }
 
+  // 1. It's know that N is a shl or srl
+  // 2. If N has one use, and that use could fold shift, return false
+  // FIXME: Extend this from ORR to other ops
+  if(N->hasNUsesOfValue(1, 0) && N->use_begin()->getOpcode() == ISD::OR) {
+    return false;
+  }
+
   return true;
 }

khchen added a subscriber: khchen.Oct 12 2022, 7:52 AM

mingmingl mentioned this in D135844: [AArch64][2/4]Regard (shl val, N) as a potential bit-field-positioning op regardless of the number of uses..Oct 18 2022, 11:08 AM

In order to generate orr with shifted operand (not bfi) when orr is better, this patch does a comparison inside tryBitfieldInsertOpFromOr (i.e., after dag-combiner, alongside cpp-based instruction selection).

This patch optimizes many existing test cases (as shown by updated tests), but introduces one regression.

Before the patch, the generation of BFM tells the bits being used in Rd and Rm respectively (see getUsefulBitsFromBFM)
After this patch, ORRWrs op0, op1, lsl #imm (with shifted register) is generated rather than BFM in some cases; however, the bit field usage information in op0 is not preserved (and there isn't a way to express this in class SDNode without adding a specialized that derives from SDNode) (for example LoadSDNode)

A side question, is it a typical use case to convey metadata (e.g., op0 and op1 inside ORR op0, op1 contributes bits that doesn't overlap) in the SDNode class?

Two alternative options:

Introduce a DAG node for BFM, so as to compare BFM and ORR in dag-combiner.
- - The drawback of generating BFM earlier (i.e., in dag-combiner) is that, all other bit-field processing nodes (AND, SHL, etc) need to be taught to combine with BFM. In other words, introducing a BFM dag-node without regressing existing combination requires a lot of work.
- I had a local patch that actually adds the BFM node, where missed combinations of BFM with existing nodes manifest.
Do the transformation in aarch64-mi-peephole-opt pass.
- Since aarch64-mi-peephole-opt optimizes based on ISel output, it's a net optimization (compared with current patch, i.e., no drawback of lost information).
- However, the implementation inside aarch64-mi-peephole-opt would handle MachineInstructions (and MachineOpcode, i.e., ISD::AND fleshes out in many forms, like AndWrs, AndWrr, etc), and cannot reuse helper functions in the ISel pass.

I think alternative #2 (inside aarch64-mi-peephole-opt) is better than alternative #1 (could be a can of worms due to missed combinations between BFM and existing AND/OR nodes), and in some sense better than the current patch (at the cost of more code work)

Feedback/thoughts on where (peephole or the current patch) to pursue this optimization would be appreciated!

Harbormaster completed remote builds in B195369: Diff 472121.Oct 31 2022, 4:27 PM

Hmm. I had not considered am ORR with shift to be cheaper than a BFM before. From what I can tell it doesn't seem to be universal across all cpus, but does look like it will be faster or equal.

A side question, is it a typical use case to convey metadata (e.g., op0 and op1 inside ORR op0, op1 contributes bits that doesn't overlap) in the SDNode class?

That sounds like it would usually be calculated with KnownBits, like in haveNoCommonBitsSet. Unfortunately post-isel the amount of information we can extract is much less than from the generic DAG nodes.

[About the two/three options]

Is the motivating pattern just the one from the commit message, or any bfm that could be a shifted orr? aarch64-mi-peephole-opt is an option - we always run into problems implementing things there but if it is easier to write that is always an option. (The machine combiner too, if scheduling info is useful). Larger patterns might be more difficult though. The (existing) code in DAG2DAG doesn't feel like it scales super well. But equally like you say adding ISel nodes has downsides. What would this look like from GlobalISel? How much code would need to be added to make aarch64-mi-peephole-opt work?

Are the only regressions on uaddo_v4i1 and umulo_v4i1? I'm not against ignoring those, if they are just overflowing nodes on i1 types being awkwardly expanded and it doesn't come up in other places.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2835	You say #7 here, but #8 elsewhere, including the test. I think 7 is correct for this example.

fix a bug around width computation (updated commit message and patch summary as well), and adjust whitespace in a few test cases (mainly those not updated by utils/update_llc_test_checks.py) to remove irrelevant diff.

In D135102#3899360, @dmgreen wrote:

Hmm. I had not considered am ORR with shift to be cheaper than a BFM before. From what I can tell it doesn't seem to be universal across all cpus, but does look like it will be faster or equal.

Yes it's true that ORR with shift could be the same as BFM (e.g. Cortex A57), or faster (e.g. NeoverseN1, CortexA77)

A side question, is it a typical use case to convey metadata (e.g., op0 and op1 inside ORR op0, op1 contributes bits that doesn't overlap) in the SDNode class?

That sounds like it would usually be calculated with KnownBits, like in haveNoCommonBitsSet. Unfortunately post-isel the amount of information we can extract is much less than from the generic DAG nodes.

Yes, SelectionDAG::computeKnownBits handles generic DAG nodes but doesn't handle machine-op-code (with NodeType < 0); as a result, computing something similar to 'haveNoCommonBitSet' won't work out of the box.

[About the two/three options]

Is the motivating pattern just the one from the commit message, or any bfm that could be a shifted orr?

Yes, the motivating test case is just the one from commit message; and the other updated tests are results of other lines (that actually look simpler, and added to show ORR-not-BFM is a more generic question to solve).

aarch64-mi-peephole-opt is an option - we always run into problems implementing things there but if it is easier to write that is always an option. (The machine combiner too, if scheduling info is useful). Larger patterns might be more difficult though. The (existing) code in DAG2DAG doesn't feel like it scales super well. But equally like you say adding ISel nodes has downsides. What would this look like from GlobalISel? How much code would need to be added to make aarch64-mi-peephole-opt work?

BFM is not used in GlobalISel (four instructions generated https://godbolt.org/z/MMvMe34zv), so a BFM pattern matcher (inside peephole or machine-combiner) won't optimize GlobalISel output in this case.

Regarding the amount of work inside peephole (or machine-combiner), I don't have a demo at hand but the number of lines should be within a few hundred (not thousand) just for the motivating test case.

However, without the context that this BFI is from ISD::OR, building up this context (that it's correct to convert BFI back to ORR) and fixing the other affected tests in this patch would require some implementation.

Are the only regressions on uaddo_v4i1 and umulo_v4i1? I'm not against ignoring those, if they are just overflowing nodes on i1 types being awkwardly expanded and it doesn't come up in other places.

In the affected test cases, only uaddo_v4i1 and umulo_v4i1 regressed -> more generally, useful-bit info (from BFM, lost in ORR) simplifies away one AND node from Dst (as shown in code link, when AND zeros exactly the bits that are going to be inserted from Src) -> in this sense, other cases might show up (not type-extended small integers)

Maybe I could write a working demo in peephole or machine-combiner for one motivating case as a start? For the rest of tests, I could file Github PR to track them.

mingmingl marked an inline comment as done.Nov 2 2022, 12:50 AM

mingmingl added inline comments.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2835	Thanks for the good catch! Fixed it (overlooked the width is 'immr - imms + 1' for UBFX)
llvm/test/CodeGen/AArch64/trunc-to-tbl.ll
238–239	This test case is generated by `utils/update_llc_test_checks.py`; but for some reason, the whitespaces cause more diff than expected. I'm going to run auto updater in a clean branch, and see if the whitespace diff is expected without this patch.

Harbormaster completed remote builds in B195635: Diff 472519.Nov 2 2022, 1:52 AM

I read through the code. I'm not the biggest expert on this DAGToDAG code, but what is here seems sensible to me. All the tests look OK too.

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
2872	LLVM usually adds message to all assert messages. It can help distinguish them especially when the condition is fairly generic.
llvm/test/CodeGen/AArch64/trunc-to-tbl.ll
238–239	Feel free to regenerate the files that need it and check those in to reduce the differences here.

mingmingl mentioned this in D137296: [NFC][AArch64]Precommit test for D135102.Nov 2 2022, 3:46 PM

mingmingl added a parent revision: D137296: [NFC][AArch64]Precommit test for D135102.Nov 2 2022, 3:47 PM

resolve comments.

Thanks for reviews! PTAL.

llvm/test/CodeGen/AArch64/trunc-to-tbl.ll
238–239	Thanks! Done in https://reviews.llvm.org/D137296

Harbormaster completed remote builds in B195834: Diff 472802.Nov 2 2022, 6:24 PM

This update runs git clang-format HEAD~1 only, no functional change --> without this, pre-merge checks fails due to ERROR git-clang-format returned an non-zero exit code 1

Harbormaster completed remote builds in B195867: Diff 472846.Nov 2 2022, 11:34 PM

mingmingl mentioned this in rG5d7fdf67f622: [NFC][AArch64]Precommit test for D135102.Nov 3 2022, 10:50 AM

rebase after D137296

mingmingl removed a parent revision: D137296: [NFC][AArch64]Precommit test for D135102.Nov 3 2022, 11:24 AM

Harbormaster completed remote builds in B195966: Diff 472988.Nov 3 2022, 12:04 PM

Thanks. LGTM

This revision is now accepted and ready to land.Nov 3 2022, 12:11 PM

thanks for reviews! Going to submit and implement the FIXME (for SRL) in follow-up patches.

Closed by commit rGf62d8a1a5044: [AArch64] Compare BFI and ORR with left-shifted operand for OR instruction… (authored by mingmingl). · Explain WhyNov 3 2022, 12:32 PM

This revision was automatically updated to reflect the committed changes.

mingmingl added a commit: rGf62d8a1a5044: [AArch64] Compare BFI and ORR with left-shifted operand for OR instruction….

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelDAGToDAG.cpp

124 lines

test/

CodeGen/

AArch64/

arm64-bitfield-extract.ll

10 lines

arm64-non-pow2-ldst.ll

30 lines

arm64-strict-align.ll

4 lines

5 lines

8 lines

17 lines

4 lines

3 lines

load-combine-big-endian.ll

20 lines

24 lines

3 lines

40 lines

4 lines

36 lines

4 lines

13 lines

13 lines

Diff 473005

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

Show First 20 Lines • Show All 2,797 Lines • ▼ Show 20 Lines	static bool tryBitfieldInsertOpFromOrAndImm(SDNode N, SelectionDAG CurDAG) {
SDValue Ops[] = {And.getOperand(0), SDValue(MOVI, 0),		SDValue Ops[] = {And.getOperand(0), SDValue(MOVI, 0),
CurDAG->getTargetConstant(ImmR, DL, VT),		CurDAG->getTargetConstant(ImmR, DL, VT),
CurDAG->getTargetConstant(ImmS, DL, VT)};		CurDAG->getTargetConstant(ImmS, DL, VT)};
unsigned Opc = (VT == MVT::i32) ? AArch64::BFMWri : AArch64::BFMXri;		unsigned Opc = (VT == MVT::i32) ? AArch64::BFMWri : AArch64::BFMXri;
CurDAG->SelectNodeTo(N, Opc, VT, Ops);		CurDAG->SelectNodeTo(N, Opc, VT, Ops);
return true;		return true;
}		}

		static bool isWorthFoldingIntoOrrWithLeftShift(SDValue Dst,
		SelectionDAG *CurDAG,
		SDValue &LeftShiftedOperand,
		uint64_t &LeftShiftAmount) {
		// Avoid folding Dst into ORR-with-left-shift if Dst has other uses than ORR.
		if (!Dst.hasOneUse())
		return false;

		EVT VT = Dst.getValueType();
		assert((VT == MVT::i32 \|\| VT == MVT::i64) &&
		"Caller should guarantee that VT is one of i32 or i64");
		const unsigned SizeInBits = VT.getSizeInBits();

		SDLoc DL(Dst.getNode());
		uint64_t AndImm, ShlImm;
		if (isOpcWithIntImmediate(Dst.getNode(), ISD::AND, AndImm) &&
		isShiftedMask_64(AndImm)) {
		// Avoid transforming 'DstOp0' if it has other uses than the AND node.
		SDValue DstOp0 = Dst.getOperand(0);
		if (!DstOp0.hasOneUse())
		return false;

		// An example to illustrate the transformation
		// From:
		// lsr x8, x1, #1
		// and x8, x8, #0x3f80
		// bfxil x8, x1, #0, #7
		// To:
		// and x8, x23, #0x7f
		// ubfx x9, x23, #8, #7
		dmgreenUnsubmitted Done Reply Inline Actions You say #7 here, but #8 elsewhere, including the test. I think 7 is correct for this example. dmgreen: You say #7 here, but #8 elsewhere, including the test. I think 7 is correct for this example.
		mingminglAuthorUnsubmitted Done Reply Inline Actions Thanks for the good catch! Fixed it (overlooked the width is 'immr - imms + 1' for UBFX) mingmingl: Thanks for the good catch! Fixed it (overlooked the width is 'immr - imms + 1' for UBFX)
		// orr x23, x8, x9, lsl #7
		//
		// The number of instructions remains the same, but ORR is faster than BFXIL
		// on many AArch64 processors (or as good as BFXIL if not faster). Besides,
		// the dependency chain is improved after the transformation.
		uint64_t SrlImm;
		if (isOpcWithIntImmediate(DstOp0.getNode(), ISD::SRL, SrlImm)) {
		uint64_t NumTrailingZeroInShiftedMask = countTrailingZeros(AndImm);
		if ((SrlImm + NumTrailingZeroInShiftedMask) < SizeInBits) {
		unsigned MaskWidth =
		countTrailingOnes(AndImm >> NumTrailingZeroInShiftedMask);
		unsigned UBFMOpc =
		(VT == MVT::i32) ? AArch64::UBFMWri : AArch64::UBFMXri;
		SDNode *UBFMNode = CurDAG->getMachineNode(
		UBFMOpc, DL, VT, DstOp0.getOperand(0),
		CurDAG->getTargetConstant(SrlImm + NumTrailingZeroInShiftedMask, DL,
		VT),
		CurDAG->getTargetConstant(
		SrlImm + NumTrailingZeroInShiftedMask + MaskWidth - 1, DL, VT));
		LeftShiftedOperand = SDValue(UBFMNode, 0);
		LeftShiftAmount = NumTrailingZeroInShiftedMask;
		return true;
		}
		}
		} else if (isOpcWithIntImmediate(Dst.getNode(), ISD::SHL, ShlImm)) {
		LeftShiftedOperand = Dst.getOperand(0);
		LeftShiftAmount = ShlImm;
		return true;
		}
		// FIXME: Extend the implementation to optimize if Dst is an SRL node.
		return false;
		}

		static bool tryOrrWithLeftShift(SDNode *N, SDValue OrOpd0, SDValue OrOpd1,
		SDValue Src, SDValue Dst, SelectionDAG *CurDAG,
		const bool BiggerPattern) {
		EVT VT = N->getValueType(0);
		dmgreenUnsubmitted Done Reply Inline Actions LLVM usually adds message to all assert messages. It can help distinguish them especially when the condition is fairly generic. dmgreen: LLVM usually adds message to all assert messages. It can help distinguish them especially when…
		assert((VT == MVT::i32 \|\| VT == MVT::i64) &&
		"Expect result type to be i32 or i64 since N is combinable to BFM");
		SDLoc DL(N);

		// Bail out if BFM simplifies away one node in BFM Dst.
		if (OrOpd1 != Dst)
		return false;

		// For "BFM Rd, Rn, #immr, #imms", it's known that BFM simplifies away fewer
		// nodes from Rn (or inserts additional shift node) if BiggerPattern is true.
		if (BiggerPattern) {
		uint64_t SrcAndImm;
		if (isOpcWithIntImmediate(OrOpd0.getNode(), ISD::AND, SrcAndImm) &&
		isMask_64(SrcAndImm) && OrOpd0.getOperand(0) == Src) {
		// OrOpd0 = AND Src, #Mask
		// So BFM simplifies away one AND node from Src and doesn't simplify away
		// nodes from Dst. If ORR with left-shifted operand also simplifies away
		// one node (from Rd), ORR is better since it has higher throughput and
		// smaller latency than BFM on many AArch64 processors (and for the rest
		// ORR is at least as good as BFM).
		SDValue LeftShiftedOperand;
		uint64_t LeftShiftAmount;
		if (isWorthFoldingIntoOrrWithLeftShift(Dst, CurDAG, LeftShiftedOperand,
		LeftShiftAmount)) {
		unsigned OrrOpc = (VT == MVT::i32) ? AArch64::ORRWrs : AArch64::ORRXrs;
		SDValue Ops[] = {OrOpd0, LeftShiftedOperand,
		CurDAG->getTargetConstant(LeftShiftAmount, DL, VT)};
		CurDAG->SelectNodeTo(N, OrrOpc, VT, Ops);
		return true;
		}
		}
		return false;
		}

		assert((!BiggerPattern) && "BiggerPattern should be handled above");

		uint64_t ShlImm;
		// FIXME: Extend the implementation if OrOpd0 is an SRL node.
		if (isOpcWithIntImmediate(OrOpd0.getNode(), ISD::SHL, ShlImm) &&
		OrOpd0.getOperand(0) == Src && OrOpd0.hasOneUse()) {
		unsigned OrrOpc = (VT == MVT::i32) ? AArch64::ORRWrs : AArch64::ORRXrs;
		SDValue Ops[] = {Dst, Src, CurDAG->getTargetConstant(ShlImm, DL, VT)};
		CurDAG->SelectNodeTo(N, OrrOpc, VT, Ops);
		return true;
		}

		return false;
		}

static bool tryBitfieldInsertOpFromOr(SDNode *N, const APInt &UsefulBits,		static bool tryBitfieldInsertOpFromOr(SDNode *N, const APInt &UsefulBits,
SelectionDAG *CurDAG) {		SelectionDAG *CurDAG) {
assert(N->getOpcode() == ISD::OR && "Expect a OR operation");		assert(N->getOpcode() == ISD::OR && "Expect a OR operation");

EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
if (VT != MVT::i32 && VT != MVT::i64)		if (VT != MVT::i32 && VT != MVT::i64)
return false;		return false;

▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	if (isOpcWithIntImmediate(OrOpd1, ISD::AND, Imm) &&
isBitfieldDstMask(Imm, BitsToBeInserted, NumberOfIgnoredHighBits, VT))		isBitfieldDstMask(Imm, BitsToBeInserted, NumberOfIgnoredHighBits, VT))
// In that case, we can eliminate the AND		// In that case, we can eliminate the AND
Dst = OrOpd1->getOperand(0);		Dst = OrOpd1->getOperand(0);
else		else
// Maybe the AND has been removed by simplify-demanded-bits		// Maybe the AND has been removed by simplify-demanded-bits
// or is useful because it discards more bits		// or is useful because it discards more bits
Dst = OrOpd1Val;		Dst = OrOpd1Val;

		// Before selecting ISD::OR node to AArch64::BFM, see if an AArch64::ORR
		// with left-shifted operand is more efficient.
		// FIXME: Extend this to compare AArch64::BFM and AArch64::ORR with
		// right-shifted operand as well.
		if (tryOrrWithLeftShift(N, OrOpd0Val, OrOpd1Val, Src, Dst, CurDAG,
		BiggerPattern))
		return true;

// both parts match		// both parts match
SDLoc DL(N);		SDLoc DL(N);
SDValue Ops[] = {Dst, Src, CurDAG->getTargetConstant(ImmR, DL, VT),		SDValue Ops[] = {Dst, Src, CurDAG->getTargetConstant(ImmR, DL, VT),
CurDAG->getTargetConstant(ImmS, DL, VT)};		CurDAG->getTargetConstant(ImmS, DL, VT)};
unsigned Opc = (VT == MVT::i32) ? AArch64::BFMWri : AArch64::BFMXri;		unsigned Opc = (VT == MVT::i32) ? AArch64::BFMWri : AArch64::BFMXri;
CurDAG->SelectNodeTo(N, Opc, VT, Ops);		CurDAG->SelectNodeTo(N, Opc, VT, Ops);
return true;		return true;
}		}
▲ Show 20 Lines • Show All 2,553 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-bitfield-extract.ll

Show First 20 Lines • Show All 958 Lines • ▼ Show 20 Lines	entry:
%arrayidx = getelementptr inbounds [8 x [64 x i64]], [8 x [64 x i64]]* @arr, i64 0, i64 0, i64 %and		%arrayidx = getelementptr inbounds [8 x [64 x i64]], [8 x [64 x i64]]* @arr, i64 0, i64 0, i64 %and
%0 = load i64, i64* %arrayidx, align 8		%0 = load i64, i64* %arrayidx, align 8
ret i64 %0		ret i64 %0
}		}

define i16 @test_ignored_rightbits(i32 %dst, i32 %in) {		define i16 @test_ignored_rightbits(i32 %dst, i32 %in) {
; LLC-LABEL: test_ignored_rightbits:		; LLC-LABEL: test_ignored_rightbits:
; LLC: // %bb.0:		; LLC: // %bb.0:
; LLC-NEXT: and w0, w0, #0x7		; LLC-NEXT: and w8, w0, #0x7
; LLC-NEXT: bfi w0, w1, #3, #4		; LLC-NEXT: bfi w8, w1, #3, #4
; LLC-NEXT: bfi w0, w0, #8, #7		; LLC-NEXT: orr w0, w8, w8, lsl #8
; LLC-NEXT: ret		; LLC-NEXT: ret
; OPT-LABEL: @test_ignored_rightbits(		; OPT-LABEL: @test_ignored_rightbits(
; OPT-NEXT: [[POSITIONED_FIELD:%.]] = shl i32 [[IN:%.]], 3		; OPT-NEXT: [[POSITIONED_FIELD:%.]] = shl i32 [[IN:%.]], 3
; OPT-NEXT: [[POSITIONED_MASKED_FIELD:%.*]] = and i32 [[POSITIONED_FIELD]], 120		; OPT-NEXT: [[POSITIONED_MASKED_FIELD:%.*]] = and i32 [[POSITIONED_FIELD]], 120
; OPT-NEXT: [[MASKED_DST:%.]] = and i32 [[DST:%.]], 7		; OPT-NEXT: [[MASKED_DST:%.]] = and i32 [[DST:%.]], 7
; OPT-NEXT: [[INSERTION:%.*]] = or i32 [[MASKED_DST]], [[POSITIONED_MASKED_FIELD]]		; OPT-NEXT: [[INSERTION:%.*]] = or i32 [[MASKED_DST]], [[POSITIONED_MASKED_FIELD]]
; OPT-NEXT: [[SHL16:%.*]] = shl i32 [[INSERTION]], 8		; OPT-NEXT: [[SHL16:%.*]] = shl i32 [[INSERTION]], 8
; OPT-NEXT: [[OR18:%.*]] = or i32 [[SHL16]], [[INSERTION]]		; OPT-NEXT: [[OR18:%.*]] = or i32 [[SHL16]], [[INSERTION]]
Show All 17 Lines
define void @sameOperandBFI(i64 %src, i64 %src2, i16 *%ptr) {		define void @sameOperandBFI(i64 %src, i64 %src2, i16 *%ptr) {
; LLC-LABEL: sameOperandBFI:		; LLC-LABEL: sameOperandBFI:
; LLC: // %bb.0: // %entry		; LLC: // %bb.0: // %entry
; LLC-NEXT: cbnz wzr, .LBB30_2		; LLC-NEXT: cbnz wzr, .LBB30_2
; LLC-NEXT: // %bb.1: // %if.else		; LLC-NEXT: // %bb.1: // %if.else
; LLC-NEXT: lsr x8, x0, #47		; LLC-NEXT: lsr x8, x0, #47
; LLC-NEXT: and w9, w1, #0x3		; LLC-NEXT: and w9, w1, #0x3
; LLC-NEXT: bfi w9, w8, #2, #2		; LLC-NEXT: bfi w9, w8, #2, #2
; LLC-NEXT: bfi w9, w9, #4, #4		; LLC-NEXT: orr w8, w9, w9, lsl #4
; LLC-NEXT: strh w9, [x2]		; LLC-NEXT: strh w8, [x2]
; LLC-NEXT: .LBB30_2: // %end		; LLC-NEXT: .LBB30_2: // %end
; LLC-NEXT: ret		; LLC-NEXT: ret
; OPT-LABEL: @sameOperandBFI(		; OPT-LABEL: @sameOperandBFI(
; OPT-NEXT: entry:		; OPT-NEXT: entry:
; OPT-NEXT: [[SHR47:%.]] = lshr i64 [[SRC:%.]], 47		; OPT-NEXT: [[SHR47:%.]] = lshr i64 [[SRC:%.]], 47
; OPT-NEXT: [[SRC2_TRUNC:%.]] = trunc i64 [[SRC2:%.]] to i32		; OPT-NEXT: [[SRC2_TRUNC:%.]] = trunc i64 [[SRC2:%.]] to i32
; OPT-NEXT: br i1 undef, label [[END:%.]], label [[IF_ELSE:%.]]		; OPT-NEXT: br i1 undef, label [[END:%.]], label [[IF_ELSE:%.]]
; OPT: if.else:		; OPT: if.else:
Show All 32 Lines

llvm/test/CodeGen/AArch64/arm64-non-pow2-ldst.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=arm64-eabi -verify-machineinstrs \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-eabi -verify-machineinstrs \| FileCheck %s

	define i24 @ldi24(ptr %p) nounwind {			define i24 @ldi24(ptr %p) nounwind {
	; CHECK-LABEL: ldi24:			; CHECK-LABEL: ldi24:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrb w8, [x0, #2]			; CHECK-NEXT: ldrb w8, [x0, #2]
	; CHECK-NEXT: ldrh w0, [x0]			; CHECK-NEXT: ldrh w9, [x0]
	; CHECK-NEXT: bfi w0, w8, #16, #16			; CHECK-NEXT: orr w0, w9, w8, lsl #16
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = load i24, i24* %p			%r = load i24, i24* %p
	ret i24 %r			ret i24 %r
	}			}

	define i56 @ldi56(ptr %p) nounwind {			define i56 @ldi56(ptr %p) nounwind {
	; CHECK-LABEL: ldi56:			; CHECK-LABEL: ldi56:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrb w8, [x0, #6]			; CHECK-NEXT: ldrb w8, [x0, #6]
	; CHECK-NEXT: ldrh w9, [x0, #4]			; CHECK-NEXT: ldrh w9, [x0, #4]
	; CHECK-NEXT: ldr w0, [x0]			; CHECK-NEXT: ldr w10, [x0]
	; CHECK-NEXT: bfi w9, w8, #16, #16			; CHECK-NEXT: orr w8, w9, w8, lsl #16
	; CHECK-NEXT: bfi x0, x9, #32, #32			; CHECK-NEXT: orr x0, x10, x8, lsl #32
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = load i56, i56* %p			%r = load i56, i56* %p
	ret i56 %r			ret i56 %r
	}			}

	define i80 @ldi80(ptr %p) nounwind {			define i80 @ldi80(ptr %p) nounwind {
	; CHECK-LABEL: ldi80:			; CHECK-LABEL: ldi80:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr x8, [x0]			; CHECK-NEXT: ldr x8, [x0]
	; CHECK-NEXT: ldrh w1, [x0, #8]			; CHECK-NEXT: ldrh w1, [x0, #8]
	; CHECK-NEXT: mov x0, x8			; CHECK-NEXT: mov x0, x8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = load i80, i80* %p			%r = load i80, i80* %p
	ret i80 %r			ret i80 %r
	}			}

	define i120 @ldi120(ptr %p) nounwind {			define i120 @ldi120(ptr %p) nounwind {
	; CHECK-LABEL: ldi120:			; CHECK-LABEL: ldi120:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrb w8, [x0, #14]			; CHECK-NEXT: ldrb w8, [x0, #14]
	; CHECK-NEXT: ldrh w9, [x0, #12]			; CHECK-NEXT: ldrh w9, [x0, #12]
	; CHECK-NEXT: ldr w1, [x0, #8]			; CHECK-NEXT: ldr w10, [x0, #8]
	; CHECK-NEXT: ldr x0, [x0]			; CHECK-NEXT: ldr x0, [x0]
	; CHECK-NEXT: bfi w9, w8, #16, #16			; CHECK-NEXT: orr w8, w9, w8, lsl #16
	; CHECK-NEXT: bfi x1, x9, #32, #32			; CHECK-NEXT: orr x1, x10, x8, lsl #32
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = load i120, i120* %p			%r = load i120, i120* %p
	ret i120 %r			ret i120 %r
	}			}

	define i280 @ldi280(ptr %p) nounwind {			define i280 @ldi280(ptr %p) nounwind {
	; CHECK-LABEL: ldi280:			; CHECK-LABEL: ldi280:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldp x8, x1, [x0]			; CHECK-NEXT: ldp x8, x1, [x0]
	; CHECK-NEXT: ldrb w9, [x0, #34]			; CHECK-NEXT: ldrb w9, [x0, #34]
	; CHECK-NEXT: ldrh w4, [x0, #32]			; CHECK-NEXT: ldrh w10, [x0, #32]
	; CHECK-NEXT: ldp x2, x3, [x0, #16]			; CHECK-NEXT: ldp x2, x3, [x0, #16]
	; CHECK-NEXT: mov x0, x8			; CHECK-NEXT: mov x0, x8
	; CHECK-NEXT: bfi x4, x9, #16, #8			; CHECK-NEXT: orr x4, x10, x9, lsl #16
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%r = load i280, i280* %p			%r = load i280, i280* %p
	ret i280 %r			ret i280 %r
	}			}

	define void @sti24(ptr %p, i24 %a) nounwind {			define void @sti24(ptr %p, i24 %a) nounwind {
	; CHECK-LABEL: sti24:			; CHECK-LABEL: sti24:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	define void @i56_or(ptr %a) {			define void @i56_or(ptr %a) {
	; CHECK-LABEL: i56_or:			; CHECK-LABEL: i56_or:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov x8, x0			; CHECK-NEXT: mov x8, x0
	; CHECK-NEXT: ldr w9, [x0]			; CHECK-NEXT: ldr w9, [x0]
	; CHECK-NEXT: ldrh w10, [x8, #4]!			; CHECK-NEXT: ldrh w10, [x8, #4]!
	; CHECK-NEXT: ldrb w11, [x8, #2]			; CHECK-NEXT: ldrb w11, [x8, #2]
	; CHECK-NEXT: orr w9, w9, #0x180			; CHECK-NEXT: orr w9, w9, #0x180
	; CHECK-NEXT: bfi w10, w11, #16, #16			; CHECK-NEXT: orr w10, w10, w11, lsl #16
	; CHECK-NEXT: str w9, [x0]			; CHECK-NEXT: str w9, [x0]
	; CHECK-NEXT: strb w11, [x8, #2]			; CHECK-NEXT: strb w11, [x8, #2]
	; CHECK-NEXT: strh w10, [x8]			; CHECK-NEXT: strh w10, [x8]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%aa = load i56, ptr %a, align 1			%aa = load i56, ptr %a, align 1
	%b = or i56 %aa, 384			%b = or i56 %aa, 384
	store i56 %b, ptr %a, align 1			store i56 %b, ptr %a, align 1
	ret void			ret void
	}			}

	define void @i56_and_or(ptr %a) {			define void @i56_and_or(ptr %a) {
	; CHECK-LABEL: i56_and_or:			; CHECK-LABEL: i56_and_or:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov x8, x0			; CHECK-NEXT: mov x8, x0
	; CHECK-NEXT: ldr w9, [x0]			; CHECK-NEXT: ldr w9, [x0]
	; CHECK-NEXT: ldrh w10, [x8, #4]!			; CHECK-NEXT: ldrh w10, [x8, #4]!
	; CHECK-NEXT: ldrb w11, [x8, #2]			; CHECK-NEXT: ldrb w11, [x8, #2]
	; CHECK-NEXT: orr w9, w9, #0x180			; CHECK-NEXT: orr w9, w9, #0x180
	; CHECK-NEXT: and w9, w9, #0xffffff80			; CHECK-NEXT: and w9, w9, #0xffffff80
	; CHECK-NEXT: bfi w10, w11, #16, #16			; CHECK-NEXT: orr w10, w10, w11, lsl #16
	; CHECK-NEXT: strb w11, [x8, #2]			; CHECK-NEXT: strb w11, [x8, #2]
	; CHECK-NEXT: str w9, [x0]			; CHECK-NEXT: str w9, [x0]
	; CHECK-NEXT: strh w10, [x8]			; CHECK-NEXT: strh w10, [x8]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%b = load i56, ptr %a, align 1			%b = load i56, ptr %a, align 1
	%c = and i56 %b, -128			%c = and i56 %b, -128
	%d = or i56 %c, 384			%d = or i56 %c, 384
	store i56 %d, ptr %a, align 1			store i56 %d, ptr %a, align 1
	ret void			ret void
	}			}

	define void @i56_insert_bit(ptr %a, i1 zeroext %bit) {			define void @i56_insert_bit(ptr %a, i1 zeroext %bit) {
	; CHECK-LABEL: i56_insert_bit:			; CHECK-LABEL: i56_insert_bit:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov x8, x0			; CHECK-NEXT: mov x8, x0
	; CHECK-NEXT: ldr w11, [x0]			; CHECK-NEXT: ldr w11, [x0]
	; CHECK-NEXT: ldrh w9, [x8, #4]!			; CHECK-NEXT: ldrh w9, [x8, #4]!
	; CHECK-NEXT: ldrb w10, [x8, #2]			; CHECK-NEXT: ldrb w10, [x8, #2]
	; CHECK-NEXT: bfi w9, w10, #16, #8			; CHECK-NEXT: orr w9, w9, w10, lsl #16
	; CHECK-NEXT: strb w10, [x8, #2]			; CHECK-NEXT: strb w10, [x8, #2]
	; CHECK-NEXT: bfi x11, x9, #32, #24			; CHECK-NEXT: orr x11, x11, x9, lsl #32
	; CHECK-NEXT: strh w9, [x8]
	; CHECK-NEXT: and x11, x11, #0xffffffffffffdfff			; CHECK-NEXT: and x11, x11, #0xffffffffffffdfff
				; CHECK-NEXT: strh w9, [x8]
	; CHECK-NEXT: orr w11, w11, w1, lsl #13			; CHECK-NEXT: orr w11, w11, w1, lsl #13
	; CHECK-NEXT: str w11, [x0]			; CHECK-NEXT: str w11, [x0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%extbit = zext i1 %bit to i56			%extbit = zext i1 %bit to i56
	%b = load i56, ptr %a, align 1			%b = load i56, ptr %a, align 1
	%extbit.shl = shl nuw nsw i56 %extbit, 13			%extbit.shl = shl nuw nsw i56 %extbit, 13
	%c = and i56 %b, -8193			%c = and i56 %b, -8193
	%d = or i56 %c, %extbit.shl			%d = or i56 %c, %extbit.shl
	store i56 %d, ptr %a, align 1			store i56 %d, ptr %a, align 1
	ret void			ret void
	}			}

llvm/test/CodeGen/AArch64/arm64-strict-align.ll

	; RUN: llc < %s -mtriple=arm64-apple-darwin \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-apple-darwin \| FileCheck %s
	; RUN: llc < %s -mtriple=arm64-apple-darwin -mattr=+strict-align \| FileCheck %s --check-prefix=CHECK-STRICT			; RUN: llc < %s -mtriple=arm64-apple-darwin -mattr=+strict-align \| FileCheck %s --check-prefix=CHECK-STRICT
	; RUN: llc < %s -mtriple=arm64-apple-darwin -mattr=+strict-align -fast-isel \| FileCheck %s --check-prefix=CHECK-STRICT			; RUN: llc < %s -mtriple=arm64-apple-darwin -mattr=+strict-align -fast-isel \| FileCheck %s --check-prefix=CHECK-STRICT

	define i32 @f0(i32* nocapture %p) nounwind {			define i32 @f0(i32* nocapture %p) nounwind {
	; CHECK-STRICT: ldrh [[HIGH:w[0-9]+]], [x0, #2]			; CHECK-STRICT: ldrh [[HIGH:w[0-9]+]], [x0, #2]
	; CHECK-STRICT: ldrh [[LOW:w[0-9]+]], [x0]			; CHECK-STRICT: ldrh [[LOW:w[0-9]+]], [x0]
	; CHECK-STRICT: bfi [[LOW]], [[HIGH]], #16, #16			; CHECK-STRICT: orr w0, [[LOW]], [[HIGH]], lsl #16
	; CHECK-STRICT: ret			; CHECK-STRICT: ret

	; CHECK: ldr w0, [x0]			; CHECK: ldr w0, [x0]
	; CHECK: ret			; CHECK: ret
	%tmp = load i32, i32* %p, align 2			%tmp = load i32, i32* %p, align 2
	ret i32 %tmp			ret i32 %tmp
	}			}

	define i64 @f1(i64* nocapture %p) nounwind {			define i64 @f1(i64* nocapture %p) nounwind {
	; CHECK-STRICT: ldp w[[LOW:[0-9]+]], w[[HIGH:[0-9]+]], [x0]			; CHECK-STRICT: ldp w[[LOW:[0-9]+]], w[[HIGH:[0-9]+]], [x0]
	; CHECK-STRICT: bfi x[[LOW]], x[[HIGH]], #32, #32			; CHECK-STRICT: orr x0, x[[LOW]], x[[HIGH]], lsl #32
	; CHECK-STRICT: ret			; CHECK-STRICT: ret

	; CHECK: ldr x0, [x0]			; CHECK: ldr x0, [x0]
	; CHECK: ret			; CHECK: ret
	%tmp = load i64, i64* %p, align 4			%tmp = load i64, i64* %p, align 4
	ret i64 %tmp			ret i64 %tmp
	}			}

llvm/test/CodeGen/AArch64/arm64_32.ll

Show First 20 Lines • Show All 656 Lines • ▼ Show 20 Lines	; CHECK: str w0,
ret void		ret void
}		}


define void @test_struct_hi(i32 %hi) nounwind {		define void @test_struct_hi(i32 %hi) nounwind {
; CHECK-LABEL: test_struct_hi:		; CHECK-LABEL: test_struct_hi:
; CHECK: mov w[[IN:[0-9]+]], w0		; CHECK: mov w[[IN:[0-9]+]], w0
; CHECK: bl _get_int		; CHECK: bl _get_int
; CHECK-FAST-NEXT: mov w0, w0		; CHECK-FAST-NEXT: mov w[[DST:[0-9]+]], w0
; CHECK-NEXT: bfi x0, x[[IN]], #32, #32		; CHECK-FAST-NEXT: orr x0, x[[DST]], x[[IN]], lsl #32
		; CHECK-OPT-NEXT: bfi x0, x[[IN]], #32, #32
; CHECK-NEXT: bl _take_pair		; CHECK-NEXT: bl _take_pair
%val.64 = call i64 @get_int()		%val.64 = call i64 @get_int()
%val.32 = trunc i64 %val.64 to i32		%val.32 = trunc i64 %val.64 to i32

%pair.0 = insertvalue [2 x i32] undef, i32 %val.32, 0		%pair.0 = insertvalue [2 x i32] undef, i32 %val.32, 0
%pair.1 = insertvalue [2 x i32] %pair.0, i32 %hi, 1		%pair.1 = insertvalue [2 x i32] %pair.0, i32 %hi, 1
call void @take_pair([2 x i32] %pair.1)		call void @take_pair([2 x i32] %pair.1)

▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/bfis-in-loop.ll

	Show All 22 Lines
	; CHECK-NEXT: ldrh w10, [x9, #72]			; CHECK-NEXT: ldrh w10, [x9, #72]
	; CHECK-NEXT: cmp w10, #0			; CHECK-NEXT: cmp w10, #0
	; CHECK-NEXT: ubfx x11, x10, #8, #24			; CHECK-NEXT: ubfx x11, x10, #8, #24
	; CHECK-NEXT: cset w12, ne			; CHECK-NEXT: cset w12, ne
	; CHECK-NEXT: csel w8, w8, w11, eq			; CHECK-NEXT: csel w8, w8, w11, eq
	; CHECK-NEXT: ldr x11, [x9, #8]			; CHECK-NEXT: ldr x11, [x9, #8]
	; CHECK-NEXT: and x9, x10, #0xff			; CHECK-NEXT: and x9, x10, #0xff
	; CHECK-NEXT: and x10, x0, #0xffffffff00000000			; CHECK-NEXT: and x10, x0, #0xffffffff00000000
	; CHECK-NEXT: bfi x9, x8, #8, #32			; CHECK-NEXT: orr x9, x9, x8, lsl #8
	; CHECK-NEXT: bfi x10, x12, #16, #1			; CHECK-NEXT: orr x10, x10, x12, lsl #16
	; CHECK-NEXT: orr x0, x10, x9			; CHECK-NEXT: orr x0, x10, x9
	; CHECK-NEXT: ldr x9, [x11, #16]			; CHECK-NEXT: ldr x9, [x11, #16]
	; CHECK-NEXT: cbnz x11, .LBB0_1			; CHECK-NEXT: cbnz x11, .LBB0_1
	; CHECK-NEXT: // %bb.2: // %exit			; CHECK-NEXT: // %bb.2: // %exit
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%var = load %struct.wobble, %struct.wobble* getelementptr inbounds (%struct.bar, %struct.bar* @global, i64 0, i32 0, i32 0), align 8			%var = load %struct.wobble, %struct.wobble* getelementptr inbounds (%struct.bar, %struct.bar* @global, i64 0, i32 0, i32 0), align 8
	br label %preheader			br label %preheader
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ldrh w10, [x9, #72]			; CHECK-NEXT: ldrh w10, [x9, #72]
	; CHECK-NEXT: cmp w10, #0			; CHECK-NEXT: cmp w10, #0
	; CHECK-NEXT: ubfx x11, x10, #8, #24			; CHECK-NEXT: ubfx x11, x10, #8, #24
	; CHECK-NEXT: cset w12, ne			; CHECK-NEXT: cset w12, ne
	; CHECK-NEXT: csel w8, w8, w11, eq			; CHECK-NEXT: csel w8, w8, w11, eq
	; CHECK-NEXT: ldr x11, [x9, #8]			; CHECK-NEXT: ldr x11, [x9, #8]
	; CHECK-NEXT: and x9, x10, #0xff			; CHECK-NEXT: and x9, x10, #0xff
	; CHECK-NEXT: and x10, x0, #0xffffffff00000000			; CHECK-NEXT: and x10, x0, #0xffffffff00000000
	; CHECK-NEXT: bfi x9, x8, #8, #32			; CHECK-NEXT: orr x9, x9, x8, lsl #8
	; CHECK-NEXT: bfi x10, x12, #16, #1			; CHECK-NEXT: orr x10, x10, x12, lsl #16
	; CHECK-NEXT: orr x0, x10, x9			; CHECK-NEXT: orr x0, x10, x9
	; CHECK-NEXT: ldr x9, [x11, #16]			; CHECK-NEXT: ldr x9, [x11, #16]
	; CHECK-NEXT: cbnz x11, .LBB1_1			; CHECK-NEXT: cbnz x11, .LBB1_1
	; CHECK-NEXT: // %bb.2: // %exit			; CHECK-NEXT: // %bb.2: // %exit
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%var = load %struct.wobble, %struct.wobble* getelementptr inbounds (%struct.bar, %struct.bar* @global, i64 0, i32 0, i32 0), align 8			%var = load %struct.wobble, %struct.wobble* getelementptr inbounds (%struct.bar, %struct.bar* @global, i64 0, i32 0, i32 0), align 8
	br label %preheader			br label %preheader
	Show All 40 Lines

llvm/test/CodeGen/AArch64/bitfield-insert.ll

	Show First 20 Lines • Show All 263 Lines • ▼ Show 20 Lines
	; Tests when all the bits from one operand are not useful			; Tests when all the bits from one operand are not useful
	define i32 @test_nouseful_bits(i8 %a, i32 %b) {			define i32 @test_nouseful_bits(i8 %a, i32 %b) {
	; CHECK-LABEL: test_nouseful_bits:			; CHECK-LABEL: test_nouseful_bits:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: and w8, w0, #0xff			; CHECK-NEXT: and w8, w0, #0xff
	; CHECK-NEXT: lsl w8, w8, #8			; CHECK-NEXT: lsl w8, w8, #8
	; CHECK-NEXT: mov w9, w8			; CHECK-NEXT: mov w9, w8
	; CHECK-NEXT: bfxil w9, w0, #0, #8			; CHECK-NEXT: bfxil w9, w0, #0, #8
	; CHECK-NEXT: bfi w8, w9, #16, #16			; CHECK-NEXT: orr w0, w8, w9, lsl #16
	; CHECK-NEXT: mov w0, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%conv = zext i8 %a to i32 ; 0 0 0 A			%conv = zext i8 %a to i32 ; 0 0 0 A
	%shl = shl i32 %b, 8 ; B2 B1 B0 0			%shl = shl i32 %b, 8 ; B2 B1 B0 0
	%or = or i32 %conv, %shl ; B2 B1 B0 A			%or = or i32 %conv, %shl ; B2 B1 B0 A
	%shl.1 = shl i32 %or, 8 ; B1 B0 A 0			%shl.1 = shl i32 %or, 8 ; B1 B0 A 0
	%or.1 = or i32 %conv, %shl.1 ; B1 B0 A A			%or.1 = or i32 %conv, %shl.1 ; B1 B0 A A
	%shl.2 = shl i32 %or.1, 8 ; B0 A A 0			%shl.2 = shl i32 %or.1, 8 ; B0 A A 0
	%or.2 = or i32 %conv, %shl.2 ; B0 A A A			%or.2 = or i32 %conv, %shl.2 ; B0 A A A
	▲ Show 20 Lines • Show All 325 Lines • ▼ Show 20 Lines
	; some AArch64 processors (for the rest, orr is at least as good as bfm)			; some AArch64 processors (for the rest, orr is at least as good as bfm)
	;			;
	; ubfx x8, x0, #8, #7			; ubfx x8, x0, #8, #7
	; and x9, x0, #0x7f			; and x9, x0, #0x7f
	; orr x0, x9, x8, lsl #7			; orr x0, x9, x8, lsl #7
	define i64 @test_orr_not_bfxil_i64(i64 %0) {			define i64 @test_orr_not_bfxil_i64(i64 %0) {
	; CHECK-LABEL: test_orr_not_bfxil_i64:			; CHECK-LABEL: test_orr_not_bfxil_i64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: lsr x8, x0, #1			; CHECK-NEXT: ubfx x8, x0, #8, #7
	; CHECK-NEXT: and x8, x8, #0x3f80			; CHECK-NEXT: and x9, x0, #0x7f
	; CHECK-NEXT: bfxil x8, x0, #0, #7			; CHECK-NEXT: orr x0, x9, x8, lsl #7
	; CHECK-NEXT: mov x0, x8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%2 = and i64 %0, 127			%2 = and i64 %0, 127
	%3 = lshr i64 %0, 1			%3 = lshr i64 %0, 1
	%4 = and i64 %3, 16256 ; 0x3f80			%4 = and i64 %3, 16256 ; 0x3f80
	%5 = or i64 %4, %2			%5 = or i64 %4, %2
	ret i64 %5			ret i64 %5
	}			}

	; The 32-bit test for `test_orr_not_bfxil_i64`.			; The 32-bit test for `test_orr_not_bfxil_i64`.
	define i32 @test_orr_not_bfxil_i32(i32 %0) {			define i32 @test_orr_not_bfxil_i32(i32 %0) {
	; CHECK-LABEL: test_orr_not_bfxil_i32:			; CHECK-LABEL: test_orr_not_bfxil_i32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: lsr w8, w0, #1			; CHECK-NEXT: ubfx w8, w0, #8, #7
	; CHECK-NEXT: and w8, w8, #0x3f80			; CHECK-NEXT: and w9, w0, #0x7f
	; CHECK-NEXT: bfxil w8, w0, #0, #7			; CHECK-NEXT: orr w0, w9, w8, lsl #7
	; CHECK-NEXT: mov w0, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%2 = and i32 %0, 127			%2 = and i32 %0, 127
	%3 = lshr i32 %0, 1			%3 = lshr i32 %0, 1
	%4 = and i32 %3, 16256 ; 0x3f80			%4 = and i32 %3, 16256 ; 0x3f80
	%5 = or i32 %4, %2			%5 = or i32 %4, %2
	ret i32 %5			ret i32 %5
	}			}

llvm/test/CodeGen/AArch64/build-pair-isel.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=aarch64 -o - -O0 %s \| FileCheck %s			; RUN: llc -mtriple=aarch64 -o - -O0 %s \| FileCheck %s
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
	target triple = "arm64-apple-ios5.0.0"			target triple = "arm64-apple-ios5.0.0"

	; This test checks we don't fail isel due to unhandled build_pair nodes.			; This test checks we don't fail isel due to unhandled build_pair nodes.
	define void @compare_and_swap128() {			define void @compare_and_swap128() {
	; CHECK-LABEL: compare_and_swap128:			; CHECK-LABEL: compare_and_swap128:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: //APP			; CHECK-NEXT: //APP
	; CHECK-NEXT: nop			; CHECK-NEXT: nop
	; CHECK-NEXT: //NO_APP			; CHECK-NEXT: //NO_APP
	; CHECK-NEXT: // implicit-def: $x9			; CHECK-NEXT: // implicit-def: $x9
	; CHECK-NEXT: mov w9, w10			; CHECK-NEXT: mov w9, w10
	; CHECK-NEXT: mov w8, w8			; CHECK-NEXT: mov w8, w8
	; CHECK-NEXT: // kill: def $x8 killed $w8			; CHECK-NEXT: // kill: def $x8 killed $w8
	; CHECK-NEXT: bfi x8, x9, #32, #32			; CHECK-NEXT: orr x8, x8, x9, lsl #32
	; CHECK-NEXT: // implicit-def: $x9			; CHECK-NEXT: // implicit-def: $x9
	; CHECK-NEXT: str x8, [x9]			; CHECK-NEXT: str x8, [x9]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = call i128 asm sideeffect "nop", "=r,~{memory}"()			%1 = call i128 asm sideeffect "nop", "=r,~{memory}"()
	store i128 %1, i128* undef, align 16			store i128 %1, i128* undef, align 16
	ret void			ret void
	}			}

llvm/test/CodeGen/AArch64/funnel-shift-rot.ll

	Show All 13 Lines
	declare <4 x i32> @llvm.fshr.v4i32(<4 x i32>, <4 x i32>, <4 x i32>)			declare <4 x i32> @llvm.fshr.v4i32(<4 x i32>, <4 x i32>, <4 x i32>)

	; When first 2 operands match, it's a rotate.			; When first 2 operands match, it's a rotate.

	define i8 @rotl_i8_const_shift(i8 %x) {			define i8 @rotl_i8_const_shift(i8 %x) {
	; CHECK-LABEL: rotl_i8_const_shift:			; CHECK-LABEL: rotl_i8_const_shift:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ubfx w8, w0, #5, #3			; CHECK-NEXT: ubfx w8, w0, #5, #3
	; CHECK-NEXT: bfi w8, w0, #3, #29			; CHECK-NEXT: orr w0, w8, w0, lsl #3
	; CHECK-NEXT: mov w0, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%f = call i8 @llvm.fshl.i8(i8 %x, i8 %x, i8 3)			%f = call i8 @llvm.fshl.i8(i8 %x, i8 %x, i8 3)
	ret i8 %f			ret i8 %f
	}			}

	define i64 @rotl_i64_const_shift(i64 %x) {			define i64 @rotl_i64_const_shift(i64 %x) {
	; CHECK-LABEL: rotl_i64_const_shift:			; CHECK-LABEL: rotl_i64_const_shift:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	▲ Show 20 Lines • Show All 196 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/load-combine-big-endian.ll

	Show First 20 Lines • Show All 457 Lines • ▼ Show 20 Lines

	; i8* p; // p is 2 byte aligned			; i8* p; // p is 2 byte aligned
	; ((i32) p[0] << 8) \| ((i32) p[1] << 16)			; ((i32) p[0] << 8) \| ((i32) p[1] << 16)
	define i32 @zext_load_i32_by_i8_shl_8(i32* %arg) {			define i32 @zext_load_i32_by_i8_shl_8(i32* %arg) {
	; CHECK-LABEL: zext_load_i32_by_i8_shl_8:			; CHECK-LABEL: zext_load_i32_by_i8_shl_8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrb w8, [x0]			; CHECK-NEXT: ldrb w8, [x0]
	; CHECK-NEXT: ldrb w9, [x0, #1]			; CHECK-NEXT: ldrb w9, [x0, #1]
	; CHECK-NEXT: lsl w0, w8, #8			; CHECK-NEXT: lsl w8, w8, #8
	; CHECK-NEXT: bfi w0, w9, #16, #8			; CHECK-NEXT: orr w0, w8, w9, lsl #16
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp = bitcast i32* %arg to i8*			%tmp = bitcast i32* %arg to i8*
	%tmp1 = getelementptr inbounds i8, i8* %tmp, i32 0			%tmp1 = getelementptr inbounds i8, i8* %tmp, i32 0
	%tmp2 = load i8, i8* %tmp1, align 2			%tmp2 = load i8, i8* %tmp1, align 2
	%tmp3 = zext i8 %tmp2 to i32			%tmp3 = zext i8 %tmp2 to i32
	%tmp30 = shl nuw nsw i32 %tmp3, 8			%tmp30 = shl nuw nsw i32 %tmp3, 8
	%tmp4 = getelementptr inbounds i8, i8* %tmp, i32 1			%tmp4 = getelementptr inbounds i8, i8* %tmp, i32 1
	%tmp5 = load i8, i8* %tmp4, align 1			%tmp5 = load i8, i8* %tmp4, align 1
	%tmp6 = zext i8 %tmp5 to i32			%tmp6 = zext i8 %tmp5 to i32
	%tmp7 = shl nuw nsw i32 %tmp6, 16			%tmp7 = shl nuw nsw i32 %tmp6, 16
	%tmp8 = or i32 %tmp7, %tmp30			%tmp8 = or i32 %tmp7, %tmp30
	ret i32 %tmp8			ret i32 %tmp8
	}			}

	; i8* p; // p is 2 byte aligned			; i8* p; // p is 2 byte aligned
	; ((i32) p[0] << 16) \| ((i32) p[1] << 24)			; ((i32) p[0] << 16) \| ((i32) p[1] << 24)
	define i32 @zext_load_i32_by_i8_shl_16(i32* %arg) {			define i32 @zext_load_i32_by_i8_shl_16(i32* %arg) {
	; CHECK-LABEL: zext_load_i32_by_i8_shl_16:			; CHECK-LABEL: zext_load_i32_by_i8_shl_16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrb w8, [x0]			; CHECK-NEXT: ldrb w8, [x0]
	; CHECK-NEXT: ldrb w9, [x0, #1]			; CHECK-NEXT: ldrb w9, [x0, #1]
	; CHECK-NEXT: lsl w0, w8, #16			; CHECK-NEXT: lsl w8, w8, #16
	; CHECK-NEXT: bfi w0, w9, #24, #8			; CHECK-NEXT: orr w0, w8, w9, lsl #24
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp = bitcast i32* %arg to i8*			%tmp = bitcast i32* %arg to i8*
	%tmp1 = getelementptr inbounds i8, i8* %tmp, i32 0			%tmp1 = getelementptr inbounds i8, i8* %tmp, i32 0
	%tmp2 = load i8, i8* %tmp1, align 2			%tmp2 = load i8, i8* %tmp1, align 2
	%tmp3 = zext i8 %tmp2 to i32			%tmp3 = zext i8 %tmp2 to i32
	%tmp30 = shl nuw nsw i32 %tmp3, 16			%tmp30 = shl nuw nsw i32 %tmp3, 16
	%tmp4 = getelementptr inbounds i8, i8* %tmp, i32 1			%tmp4 = getelementptr inbounds i8, i8* %tmp, i32 1
	%tmp5 = load i8, i8* %tmp4, align 1			%tmp5 = load i8, i8* %tmp4, align 1
	Show All 23 Lines

	; i8* p; // p is 2 byte aligned			; i8* p; // p is 2 byte aligned
	; ((i32) p[1] << 8) \| ((i32) p[0] << 16)			; ((i32) p[1] << 8) \| ((i32) p[0] << 16)
	define i32 @zext_load_i32_by_i8_bswap_shl_8(i32* %arg) {			define i32 @zext_load_i32_by_i8_bswap_shl_8(i32* %arg) {
	; CHECK-LABEL: zext_load_i32_by_i8_bswap_shl_8:			; CHECK-LABEL: zext_load_i32_by_i8_bswap_shl_8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrb w8, [x0, #1]			; CHECK-NEXT: ldrb w8, [x0, #1]
	; CHECK-NEXT: ldrb w9, [x0]			; CHECK-NEXT: ldrb w9, [x0]
	; CHECK-NEXT: lsl w0, w8, #8			; CHECK-NEXT: lsl w8, w8, #8
	; CHECK-NEXT: bfi w0, w9, #16, #8			; CHECK-NEXT: orr w0, w8, w9, lsl #16
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp = bitcast i32* %arg to i8*			%tmp = bitcast i32* %arg to i8*
	%tmp1 = getelementptr inbounds i8, i8* %tmp, i32 1			%tmp1 = getelementptr inbounds i8, i8* %tmp, i32 1
	%tmp2 = load i8, i8* %tmp1, align 1			%tmp2 = load i8, i8* %tmp1, align 1
	%tmp3 = zext i8 %tmp2 to i32			%tmp3 = zext i8 %tmp2 to i32
	%tmp30 = shl nuw nsw i32 %tmp3, 8			%tmp30 = shl nuw nsw i32 %tmp3, 8
	%tmp4 = getelementptr inbounds i8, i8* %tmp, i32 0			%tmp4 = getelementptr inbounds i8, i8* %tmp, i32 0
	%tmp5 = load i8, i8* %tmp4, align 2			%tmp5 = load i8, i8* %tmp4, align 2
	%tmp6 = zext i8 %tmp5 to i32			%tmp6 = zext i8 %tmp5 to i32
	%tmp7 = shl nuw nsw i32 %tmp6, 16			%tmp7 = shl nuw nsw i32 %tmp6, 16
	%tmp8 = or i32 %tmp7, %tmp30			%tmp8 = or i32 %tmp7, %tmp30
	ret i32 %tmp8			ret i32 %tmp8
	}			}

	; i8* p; // p is 2 byte aligned			; i8* p; // p is 2 byte aligned
	; ((i32) p[1] << 16) \| ((i32) p[0] << 24)			; ((i32) p[1] << 16) \| ((i32) p[0] << 24)
	define i32 @zext_load_i32_by_i8_bswap_shl_16(i32* %arg) {			define i32 @zext_load_i32_by_i8_bswap_shl_16(i32* %arg) {
	; CHECK-LABEL: zext_load_i32_by_i8_bswap_shl_16:			; CHECK-LABEL: zext_load_i32_by_i8_bswap_shl_16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrb w8, [x0, #1]			; CHECK-NEXT: ldrb w8, [x0, #1]
	; CHECK-NEXT: ldrb w9, [x0]			; CHECK-NEXT: ldrb w9, [x0]
	; CHECK-NEXT: lsl w0, w8, #16			; CHECK-NEXT: lsl w8, w8, #16
	; CHECK-NEXT: bfi w0, w9, #24, #8			; CHECK-NEXT: orr w0, w8, w9, lsl #24
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%tmp = bitcast i32* %arg to i8*			%tmp = bitcast i32* %arg to i8*
	%tmp1 = getelementptr inbounds i8, i8* %tmp, i32 1			%tmp1 = getelementptr inbounds i8, i8* %tmp, i32 1
	%tmp2 = load i8, i8* %tmp1, align 1			%tmp2 = load i8, i8* %tmp1, align 1
	%tmp3 = zext i8 %tmp2 to i32			%tmp3 = zext i8 %tmp2 to i32
	%tmp30 = shl nuw nsw i32 %tmp3, 16			%tmp30 = shl nuw nsw i32 %tmp3, 16
	%tmp4 = getelementptr inbounds i8, i8* %tmp, i32 0			%tmp4 = getelementptr inbounds i8, i8* %tmp, i32 0
	%tmp5 = load i8, i8* %tmp4, align 2			%tmp5 = load i8, i8* %tmp4, align 2
	%tmp6 = zext i8 %tmp5 to i32			%tmp6 = zext i8 %tmp5 to i32
	%tmp7 = shl nuw nsw i32 %tmp6, 24			%tmp7 = shl nuw nsw i32 %tmp6, 24
	%tmp8 = or i32 %tmp7, %tmp30			%tmp8 = or i32 %tmp7, %tmp30
	ret i32 %tmp8			ret i32 %tmp8
	}			}

	; i8* p;			; i8* p;
	; i16* p1.i16 = (i16*) p;			; i16* p1.i16 = (i16*) p;
	; (p1.i16[0] << 8) \| ((i16) p[2])			; (p1.i16[0] << 8) \| ((i16) p[2])
	;			;
	; This is essentialy a i16 load from p[1], but we don't fold the pattern now			; This is essentialy a i16 load from p[1], but we don't fold the pattern now
	; because in the original DAG we don't have p[1] address available			; because in the original DAG we don't have p[1] address available
	define i16 @load_i16_from_nonzero_offset(i8* %p) {			define i16 @load_i16_from_nonzero_offset(i8* %p) {
	; CHECK-LABEL: load_i16_from_nonzero_offset:			; CHECK-LABEL: load_i16_from_nonzero_offset:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrh w8, [x0]			; CHECK-NEXT: ldrh w8, [x0]
	; CHECK-NEXT: ldrb w0, [x0, #2]			; CHECK-NEXT: ldrb w9, [x0, #2]
	; CHECK-NEXT: bfi w0, w8, #8, #24			; CHECK-NEXT: orr w0, w9, w8, lsl #8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%p1.i16 = bitcast i8* %p to i16*			%p1.i16 = bitcast i8* %p to i16*
	%p2.i8 = getelementptr i8, i8* %p, i64 2			%p2.i8 = getelementptr i8, i8* %p, i64 2
	%v1 = load i16, i16* %p1.i16			%v1 = load i16, i16* %p1.i16
	%v2.i8 = load i8, i8* %p2.i8			%v2.i8 = load i8, i8* %p2.i8
	%v2 = zext i8 %v2.i8 to i16			%v2 = zext i8 %v2.i8 to i16
	%v1.shl = shl i16 %v1, 8			%v1.shl = shl i16 %v1, 8
	%res = or i16 %v1.shl, %v2			%res = or i16 %v1.shl, %v2
	ret i16 %res			ret i16 %res
	}			}

llvm/test/CodeGen/AArch64/load-combine.ll

	Show First 20 Lines • Show All 447 Lines • ▼ Show 20 Lines

	; i8* p; // p is 2 byte aligned			; i8* p; // p is 2 byte aligned
	; ((i32) p[0] << 8) \| ((i32) p[1] << 16)			; ((i32) p[0] << 8) \| ((i32) p[1] << 16)
	define i32 @zext_load_i32_by_i8_shl_8(i32* %arg) {			define i32 @zext_load_i32_by_i8_shl_8(i32* %arg) {
	; CHECK-LABEL: zext_load_i32_by_i8_shl_8:			; CHECK-LABEL: zext_load_i32_by_i8_shl_8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrb w8, [x0]			; CHECK-NEXT: ldrb w8, [x0]
	; CHECK-NEXT: ldrb w9, [x0, #1]			; CHECK-NEXT: ldrb w9, [x0, #1]
	; CHECK-NEXT: lsl w0, w8, #8			; CHECK-NEXT: lsl w8, w8, #8
	; CHECK-NEXT: bfi w0, w9, #16, #8			; CHECK-NEXT: orr w0, w8, w9, lsl #16
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

	%tmp = bitcast i32* %arg to i8*			%tmp = bitcast i32* %arg to i8*
	%tmp1 = getelementptr inbounds i8, i8* %tmp, i32 0			%tmp1 = getelementptr inbounds i8, i8* %tmp, i32 0
	%tmp2 = load i8, i8* %tmp1, align 2			%tmp2 = load i8, i8* %tmp1, align 2
	%tmp3 = zext i8 %tmp2 to i32			%tmp3 = zext i8 %tmp2 to i32
	%tmp30 = shl nuw nsw i32 %tmp3, 8			%tmp30 = shl nuw nsw i32 %tmp3, 8
	%tmp4 = getelementptr inbounds i8, i8* %tmp, i32 1			%tmp4 = getelementptr inbounds i8, i8* %tmp, i32 1
	%tmp5 = load i8, i8* %tmp4, align 1			%tmp5 = load i8, i8* %tmp4, align 1
	%tmp6 = zext i8 %tmp5 to i32			%tmp6 = zext i8 %tmp5 to i32
	%tmp7 = shl nuw nsw i32 %tmp6, 16			%tmp7 = shl nuw nsw i32 %tmp6, 16
	%tmp8 = or i32 %tmp7, %tmp30			%tmp8 = or i32 %tmp7, %tmp30
	ret i32 %tmp8			ret i32 %tmp8
	}			}

	; i8* p; // p is 2 byte aligned			; i8* p; // p is 2 byte aligned
	; ((i32) p[0] << 16) \| ((i32) p[1] << 24)			; ((i32) p[0] << 16) \| ((i32) p[1] << 24)
	define i32 @zext_load_i32_by_i8_shl_16(i32* %arg) {			define i32 @zext_load_i32_by_i8_shl_16(i32* %arg) {
	; CHECK-LABEL: zext_load_i32_by_i8_shl_16:			; CHECK-LABEL: zext_load_i32_by_i8_shl_16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrb w8, [x0]			; CHECK-NEXT: ldrb w8, [x0]
	; CHECK-NEXT: ldrb w9, [x0, #1]			; CHECK-NEXT: ldrb w9, [x0, #1]
	; CHECK-NEXT: lsl w0, w8, #16			; CHECK-NEXT: lsl w8, w8, #16
	; CHECK-NEXT: bfi w0, w9, #24, #8			; CHECK-NEXT: orr w0, w8, w9, lsl #24
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

	%tmp = bitcast i32* %arg to i8*			%tmp = bitcast i32* %arg to i8*
	%tmp1 = getelementptr inbounds i8, i8* %tmp, i32 0			%tmp1 = getelementptr inbounds i8, i8* %tmp, i32 0
	%tmp2 = load i8, i8* %tmp1, align 2			%tmp2 = load i8, i8* %tmp1, align 2
	%tmp3 = zext i8 %tmp2 to i32			%tmp3 = zext i8 %tmp2 to i32
	%tmp30 = shl nuw nsw i32 %tmp3, 16			%tmp30 = shl nuw nsw i32 %tmp3, 16
	%tmp4 = getelementptr inbounds i8, i8* %tmp, i32 1			%tmp4 = getelementptr inbounds i8, i8* %tmp, i32 1
	Show All 26 Lines

	; i8* p; // p is 2 byte aligned			; i8* p; // p is 2 byte aligned
	; ((i32) p[1] << 8) \| ((i32) p[0] << 16)			; ((i32) p[1] << 8) \| ((i32) p[0] << 16)
	define i32 @zext_load_i32_by_i8_bswap_shl_8(i32* %arg) {			define i32 @zext_load_i32_by_i8_bswap_shl_8(i32* %arg) {
	; CHECK-LABEL: zext_load_i32_by_i8_bswap_shl_8:			; CHECK-LABEL: zext_load_i32_by_i8_bswap_shl_8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrb w8, [x0, #1]			; CHECK-NEXT: ldrb w8, [x0, #1]
	; CHECK-NEXT: ldrb w9, [x0]			; CHECK-NEXT: ldrb w9, [x0]
	; CHECK-NEXT: lsl w0, w8, #8			; CHECK-NEXT: lsl w8, w8, #8
	; CHECK-NEXT: bfi w0, w9, #16, #8			; CHECK-NEXT: orr w0, w8, w9, lsl #16
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

	%tmp = bitcast i32* %arg to i8*			%tmp = bitcast i32* %arg to i8*
	%tmp1 = getelementptr inbounds i8, i8* %tmp, i32 1			%tmp1 = getelementptr inbounds i8, i8* %tmp, i32 1
	%tmp2 = load i8, i8* %tmp1, align 1			%tmp2 = load i8, i8* %tmp1, align 1
	%tmp3 = zext i8 %tmp2 to i32			%tmp3 = zext i8 %tmp2 to i32
	%tmp30 = shl nuw nsw i32 %tmp3, 8			%tmp30 = shl nuw nsw i32 %tmp3, 8
	%tmp4 = getelementptr inbounds i8, i8* %tmp, i32 0			%tmp4 = getelementptr inbounds i8, i8* %tmp, i32 0
	%tmp5 = load i8, i8* %tmp4, align 2			%tmp5 = load i8, i8* %tmp4, align 2
	%tmp6 = zext i8 %tmp5 to i32			%tmp6 = zext i8 %tmp5 to i32
	%tmp7 = shl nuw nsw i32 %tmp6, 16			%tmp7 = shl nuw nsw i32 %tmp6, 16
	%tmp8 = or i32 %tmp7, %tmp30			%tmp8 = or i32 %tmp7, %tmp30
	ret i32 %tmp8			ret i32 %tmp8
	}			}

	; i8* p; // p is 2 byte aligned			; i8* p; // p is 2 byte aligned
	; ((i32) p[1] << 16) \| ((i32) p[0] << 24)			; ((i32) p[1] << 16) \| ((i32) p[0] << 24)
	define i32 @zext_load_i32_by_i8_bswap_shl_16(i32* %arg) {			define i32 @zext_load_i32_by_i8_bswap_shl_16(i32* %arg) {
	; CHECK-LABEL: zext_load_i32_by_i8_bswap_shl_16:			; CHECK-LABEL: zext_load_i32_by_i8_bswap_shl_16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldrb w8, [x0, #1]			; CHECK-NEXT: ldrb w8, [x0, #1]
	; CHECK-NEXT: ldrb w9, [x0]			; CHECK-NEXT: ldrb w9, [x0]
	; CHECK-NEXT: lsl w0, w8, #16			; CHECK-NEXT: lsl w8, w8, #16
	; CHECK-NEXT: bfi w0, w9, #24, #8			; CHECK-NEXT: orr w0, w8, w9, lsl #24
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

	%tmp = bitcast i32* %arg to i8*			%tmp = bitcast i32* %arg to i8*
	%tmp1 = getelementptr inbounds i8, i8* %tmp, i32 1			%tmp1 = getelementptr inbounds i8, i8* %tmp, i32 1
	%tmp2 = load i8, i8* %tmp1, align 1			%tmp2 = load i8, i8* %tmp1, align 1
	%tmp3 = zext i8 %tmp2 to i32			%tmp3 = zext i8 %tmp2 to i32
	%tmp30 = shl nuw nsw i32 %tmp3, 16			%tmp30 = shl nuw nsw i32 %tmp3, 16
	%tmp4 = getelementptr inbounds i8, i8* %tmp, i32 0			%tmp4 = getelementptr inbounds i8, i8* %tmp, i32 0
	Show All 40 Lines
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr s0, [x0]			; CHECK-NEXT: ldr s0, [x0]
	; CHECK-NEXT: ushll v0.8h, v0.8b, #0			; CHECK-NEXT: ushll v0.8h, v0.8b, #0
	; CHECK-NEXT: umov w8, v0.h[2]			; CHECK-NEXT: umov w8, v0.h[2]
	; CHECK-NEXT: umov w9, v0.h[1]			; CHECK-NEXT: umov w9, v0.h[1]
	; CHECK-NEXT: umov w10, v0.h[3]			; CHECK-NEXT: umov w10, v0.h[3]
	; CHECK-NEXT: lsl w8, w8, #16			; CHECK-NEXT: lsl w8, w8, #16
	; CHECK-NEXT: bfi w8, w9, #8, #8			; CHECK-NEXT: bfi w8, w9, #8, #8
	; CHECK-NEXT: bfi w8, w10, #24, #8			; CHECK-NEXT: orr w8, w8, w10, lsl #24
	; CHECK-NEXT: str w8, [x1]			; CHECK-NEXT: str w8, [x1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%ld = load <4 x i8>, <4 x i8>* %in, align 4			%ld = load <4 x i8>, <4 x i8>* %in, align 4

	%e2 = extractelement <4 x i8> %ld, i32 1			%e2 = extractelement <4 x i8> %ld, i32 1
	%e3 = extractelement <4 x i8> %ld, i32 2			%e3 = extractelement <4 x i8> %ld, i32 2
	%e4 = extractelement <4 x i8> %ld, i32 3			%e4 = extractelement <4 x i8> %ld, i32 3

	Show All 14 Lines

	define void @short_vector_to_i32_unused_high_i8(<4 x i8>* %in, i32* %out, i32* %p) {			define void @short_vector_to_i32_unused_high_i8(<4 x i8>* %in, i32* %out, i32* %p) {
	; CHECK-LABEL: short_vector_to_i32_unused_high_i8:			; CHECK-LABEL: short_vector_to_i32_unused_high_i8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr s0, [x0]			; CHECK-NEXT: ldr s0, [x0]
	; CHECK-NEXT: ldrh w9, [x0]			; CHECK-NEXT: ldrh w9, [x0]
	; CHECK-NEXT: ushll v0.8h, v0.8b, #0			; CHECK-NEXT: ushll v0.8h, v0.8b, #0
	; CHECK-NEXT: umov w8, v0.h[2]			; CHECK-NEXT: umov w8, v0.h[2]
	; CHECK-NEXT: bfi w9, w8, #16, #8			; CHECK-NEXT: orr w8, w9, w8, lsl #16
	; CHECK-NEXT: str w9, [x1]			; CHECK-NEXT: str w8, [x1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%ld = load <4 x i8>, <4 x i8>* %in, align 4			%ld = load <4 x i8>, <4 x i8>* %in, align 4

	%e1 = extractelement <4 x i8> %ld, i32 0			%e1 = extractelement <4 x i8> %ld, i32 0
	%e2 = extractelement <4 x i8> %ld, i32 1			%e2 = extractelement <4 x i8> %ld, i32 1
	%e3 = extractelement <4 x i8> %ld, i32 2			%e3 = extractelement <4 x i8> %ld, i32 2

	%z0 = zext i8 %e1 to i32			%z0 = zext i8 %e1 to i32
	Show All 13 Lines
	define void @short_vector_to_i32_unused_low_i16(<4 x i8>* %in, i32* %out, i32* %p) {			define void @short_vector_to_i32_unused_low_i16(<4 x i8>* %in, i32* %out, i32* %p) {
	; CHECK-LABEL: short_vector_to_i32_unused_low_i16:			; CHECK-LABEL: short_vector_to_i32_unused_low_i16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: ldr s0, [x0]			; CHECK-NEXT: ldr s0, [x0]
	; CHECK-NEXT: ushll v0.8h, v0.8b, #0			; CHECK-NEXT: ushll v0.8h, v0.8b, #0
	; CHECK-NEXT: umov w8, v0.h[3]			; CHECK-NEXT: umov w8, v0.h[3]
	; CHECK-NEXT: umov w9, v0.h[2]			; CHECK-NEXT: umov w9, v0.h[2]
	; CHECK-NEXT: lsl w8, w8, #24			; CHECK-NEXT: lsl w8, w8, #24
	; CHECK-NEXT: bfi w8, w9, #16, #8			; CHECK-NEXT: orr w8, w8, w9, lsl #16
	; CHECK-NEXT: str w8, [x1]			; CHECK-NEXT: str w8, [x1]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%ld = load <4 x i8>, <4 x i8>* %in, align 4			%ld = load <4 x i8>, <4 x i8>* %in, align 4

	%e3 = extractelement <4 x i8> %ld, i32 2			%e3 = extractelement <4 x i8> %ld, i32 2
	%e4 = extractelement <4 x i8> %ld, i32 3			%e4 = extractelement <4 x i8> %ld, i32 3

	%z2 = zext i8 %e3 to i32			%z2 = zext i8 %e3 to i32
	▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/logic-shift.ll

Show First 20 Lines • Show All 812 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret i8 %r		ret i8 %r
}		}

define i32 @or_fshr_wrong_shift(i32 %x, i32 %y) {		define i32 @or_fshr_wrong_shift(i32 %x, i32 %y) {
; CHECK-LABEL: or_fshr_wrong_shift:		; CHECK-LABEL: or_fshr_wrong_shift:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: orr w8, w0, w1		; CHECK-NEXT: orr w8, w0, w1
; CHECK-NEXT: lsr w8, w8, #26		; CHECK-NEXT: lsr w8, w8, #26
; CHECK-NEXT: bfi w8, w0, #7, #25		; CHECK-NEXT: orr w0, w8, w0, lsl #7
; CHECK-NEXT: mov w0, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%or1 = or i32 %x, %y		%or1 = or i32 %x, %y
%sh1 = shl i32 %x, 7		%sh1 = shl i32 %x, 7
%sh2 = lshr i32 %or1, 26		%sh2 = lshr i32 %or1, 26
%r = or i32 %sh1, %sh2		%r = or i32 %sh1, %sh2
ret i32 %r		ret i32 %r
}		}

llvm/test/CodeGen/AArch64/nontemporal-load.ll

	Show First 20 Lines • Show All 484 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: mov.d v0[1], x1			; CHECK-NEXT: mov.d v0[1], x1
	; CHECK-NEXT: ubfx x5, x10, #2, #1			; CHECK-NEXT: ubfx x5, x10, #2, #1
	; CHECK-NEXT: ubfx x7, x11, #3, #1			; CHECK-NEXT: ubfx x7, x11, #3, #1
	; CHECK-NEXT: fmov x0, d0			; CHECK-NEXT: fmov x0, d0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	;			;
	; CHECK-BE-LABEL: test_ldnp_v4i65:			; CHECK-BE-LABEL: test_ldnp_v4i65:
	; CHECK-BE: // %bb.0:			; CHECK-BE: // %bb.0:
	; CHECK-BE-NEXT: ldp x9, x8, [x0, #16]			; CHECK-BE-NEXT: ldp x10, x9, [x0, #16]
	; CHECK-BE-NEXT: ldp x11, x10, [x0]			; CHECK-BE-NEXT: ldp x12, x11, [x0]
	; CHECK-BE-NEXT: ldrb w7, [x0, #32]			; CHECK-BE-NEXT: ldrb w8, [x0, #32]
	; CHECK-BE-NEXT: lsr x13, x9, #56			; CHECK-BE-NEXT: lsr x13, x10, #56
	; CHECK-BE-NEXT: lsr x14, x11, #56			; CHECK-BE-NEXT: lsr x14, x12, #56
	; CHECK-BE-NEXT: extr x15, x10, x9, #56			; CHECK-BE-NEXT: extr x15, x11, x10, #56
	; CHECK-BE-NEXT: bfi x7, x8, #8, #56			; CHECK-BE-NEXT: orr x7, x8, x9, lsl #8
	; CHECK-BE-NEXT: extr x8, x9, x8, #56			; CHECK-BE-NEXT: extr x8, x10, x9, #56
	; CHECK-BE-NEXT: extr x12, x11, x10, #56			; CHECK-BE-NEXT: extr x9, x12, x11, #56
	; CHECK-BE-NEXT: lsr x11, x11, #59			; CHECK-BE-NEXT: lsr x12, x12, #59
	; CHECK-BE-NEXT: ubfx x9, x9, #57, #1			; CHECK-BE-NEXT: ubfx x10, x10, #57, #1
	; CHECK-BE-NEXT: extr x5, x13, x8, #1			; CHECK-BE-NEXT: extr x5, x13, x8, #1
	; CHECK-BE-NEXT: extr x1, x14, x12, #3			; CHECK-BE-NEXT: extr x1, x14, x9, #3
	; CHECK-BE-NEXT: ubfx x12, x10, #58, #1			; CHECK-BE-NEXT: ubfx x9, x11, #58, #1
	; CHECK-BE-NEXT: fmov d0, x11			; CHECK-BE-NEXT: fmov d0, x12
	; CHECK-BE-NEXT: and x11, x8, #0x1			; CHECK-BE-NEXT: and x12, x8, #0x1
	; CHECK-BE-NEXT: lsr x10, x10, #56			; CHECK-BE-NEXT: lsr x11, x11, #56
	; CHECK-BE-NEXT: fmov d2, x9			; CHECK-BE-NEXT: fmov d2, x10
	; CHECK-BE-NEXT: fmov d1, x12			; CHECK-BE-NEXT: fmov d1, x9
	; CHECK-BE-NEXT: extr x3, x10, x15, #2			; CHECK-BE-NEXT: extr x3, x11, x15, #2
	; CHECK-BE-NEXT: fmov d3, x11			; CHECK-BE-NEXT: fmov d3, x12
	; CHECK-BE-NEXT: mov v0.d[1], x1			; CHECK-BE-NEXT: mov v0.d[1], x1
	; CHECK-BE-NEXT: mov v2.d[1], x5			; CHECK-BE-NEXT: mov v2.d[1], x5
	; CHECK-BE-NEXT: mov v1.d[1], x3			; CHECK-BE-NEXT: mov v1.d[1], x3
	; CHECK-BE-NEXT: mov v3.d[1], x7			; CHECK-BE-NEXT: mov v3.d[1], x7
	; CHECK-BE-NEXT: fmov x0, d0			; CHECK-BE-NEXT: fmov x0, d0
	; CHECK-BE-NEXT: fmov x4, d2			; CHECK-BE-NEXT: fmov x4, d2
	; CHECK-BE-NEXT: fmov x2, d1			; CHECK-BE-NEXT: fmov x2, d1
	; CHECK-BE-NEXT: fmov x6, d3			; CHECK-BE-NEXT: fmov x6, d3
	▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/rotate-extract.ll

Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret i32 %out		ret i32 %out
}		}

; Can factor 128 from 2304, but result is 18 instead of 9		; Can factor 128 from 2304, but result is 18 instead of 9
define i64 @no_extract_mul(i64 %i) nounwind {		define i64 @no_extract_mul(i64 %i) nounwind {
; CHECK-LABEL: no_extract_mul:		; CHECK-LABEL: no_extract_mul:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: add x8, x0, x0, lsl #3		; CHECK-NEXT: add x8, x0, x0, lsl #3
; CHECK-NEXT: lsr x0, x8, #57		; CHECK-NEXT: lsr x9, x8, #57
; CHECK-NEXT: bfi x0, x8, #8, #56		; CHECK-NEXT: orr x0, x9, x8, lsl #8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%lhs_mul = mul i64 %i, 2304		%lhs_mul = mul i64 %i, 2304
%rhs_mul = mul i64 %i, 9		%rhs_mul = mul i64 %i, 9
%rhs_shift = lshr i64 %rhs_mul, 57		%rhs_shift = lshr i64 %rhs_mul, 57
%out = or i64 %lhs_mul, %rhs_shift		%out = or i64 %lhs_mul, %rhs_shift
ret i64 %out		ret i64 %out
}		}

Show All 20 Lines

llvm/test/CodeGen/AArch64/trunc-to-tbl.ll

Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	loop:
%iv.next = add i64 %iv, 1		%iv.next = add i64 %iv, 1
%ec = icmp eq i64 %iv.next, 1000		%ec = icmp eq i64 %iv.next, 1000
br i1 %ec, label %loop, label %exit		br i1 %ec, label %loop, label %exit

exit:		exit:
ret void		ret void
}		}

define void @trunc_v16i64_to_v16i8_in_loop(ptr %A, ptr %dst) {		define void @trunc_v16i64_to_v16i8_in_loop(ptr %A, ptr %dst) {
; CHECK-LABEL: trunc_v16i64_to_v16i8_in_loop:		; CHECK-LABEL: trunc_v16i64_to_v16i8_in_loop:
		mingminglAuthorUnsubmitted Done Reply Inline Actions This test case is generated by `utils/update_llc_test_checks.py`; but for some reason, the whitespaces cause more diff than expected. I'm going to run auto updater in a clean branch, and see if the whitespace diff is expected without this patch. mingmingl: This test case is generated by `utils/update_llc_test_checks.py`; but for some reason, the…
		dmgreenUnsubmitted Done Reply Inline Actions Feel free to regenerate the files that need it and check those in to reduce the differences here. dmgreen: Feel free to regenerate the files that need it and check those in to reduce the differences…
		mingminglAuthorUnsubmitted Done Reply Inline Actions Thanks! Done in https://reviews.llvm.org/D137296 mingmingl: Thanks! Done in https://reviews.llvm.org/D137296
; CHECK: ; %bb.0: ; %entry		; CHECK: ; %bb.0: ; %entry
; CHECK-NEXT: mov x8, xzr		; CHECK-NEXT: mov x8, xzr
; CHECK-NEXT: LBB3_1: ; %loop		; CHECK-NEXT: LBB3_1: ; %loop
; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1		; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
; CHECK-NEXT: add x9, x0, x8, lsl #7		; CHECK-NEXT: add x9, x0, x8, lsl #7
; CHECK-NEXT: ldp q3, q2, [x9, #96]		; CHECK-NEXT: ldp q3, q2, [x9, #96]
; CHECK-NEXT: ldp q1, q0, [x9, #32]		; CHECK-NEXT: ldp q1, q0, [x9, #32]
; CHECK-NEXT: uzp1.4s v2, v3, v2		; CHECK-NEXT: uzp1.4s v2, v3, v2
▲ Show 20 Lines • Show All 138 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ldp x10, x9, [x0]		; CHECK-NEXT: ldp x10, x9, [x0]
; CHECK-NEXT: ldrb w11, [x0, #18]		; CHECK-NEXT: ldrb w11, [x0, #18]
; CHECK-NEXT: ldrh w13, [x0, #16]		; CHECK-NEXT: ldrh w13, [x0, #16]
; CHECK-NEXT: add x0, x0, #32		; CHECK-NEXT: add x0, x0, #32
; CHECK-NEXT: lsr x14, x10, #19		; CHECK-NEXT: lsr x14, x10, #19
; CHECK-NEXT: fmov s0, w10		; CHECK-NEXT: fmov s0, w10
; CHECK-NEXT: ubfx x12, x9, #12, #20		; CHECK-NEXT: ubfx x12, x9, #12, #20
; CHECK-NEXT: lsr x15, x9, #31		; CHECK-NEXT: lsr x15, x9, #31
; CHECK-NEXT: bfi w13, w11, #16, #8		; CHECK-NEXT: orr w11, w13, w11, lsl #16
; CHECK-NEXT: lsr x11, x9, #50		; CHECK-NEXT: lsr x13, x9, #50
; CHECK-NEXT: mov.s v0[1], w14		; CHECK-NEXT: mov.s v0[1], w14
; CHECK-NEXT: fmov s1, w12		; CHECK-NEXT: fmov s1, w12
; CHECK-NEXT: lsr x12, x10, #38		; CHECK-NEXT: lsr x12, x10, #38
; CHECK-NEXT: bfi w11, w13, #14, #18		; CHECK-NEXT: orr w13, w13, w11, lsl #14
; CHECK-NEXT: lsr x10, x10, #57		; CHECK-NEXT: lsr x10, x10, #57
; CHECK-NEXT: bfi w10, w9, #7, #25		; CHECK-NEXT: orr w9, w10, w9, lsl #7
; CHECK-NEXT: lsr w9, w13, #5		; CHECK-NEXT: lsr w10, w11, #5
; CHECK-NEXT: mov.s v1[1], w15		; CHECK-NEXT: mov.s v1[1], w15
; CHECK-NEXT: mov.s v0[2], w12		; CHECK-NEXT: mov.s v0[2], w12
; CHECK-NEXT: mov.s v1[2], w11		; CHECK-NEXT: mov.s v1[2], w13
; CHECK-NEXT: mov.s v0[3], w10		; CHECK-NEXT: mov.s v0[3], w9
; CHECK-NEXT: mov.s v1[3], w9		; CHECK-NEXT: mov.s v1[3], w10
; CHECK-NEXT: uzp1.8h v0, v0, v1		; CHECK-NEXT: uzp1.8h v0, v0, v1
; CHECK-NEXT: xtn.8b v0, v0		; CHECK-NEXT: xtn.8b v0, v0
; CHECK-NEXT: str d0, [x1, x8, lsl #3]		; CHECK-NEXT: str d0, [x1, x8, lsl #3]
; CHECK-NEXT: add x8, x8, #1		; CHECK-NEXT: add x8, x8, #1
; CHECK-NEXT: cmp x8, #1000		; CHECK-NEXT: cmp x8, #1000
; CHECK-NEXT: b.eq LBB5_1		; CHECK-NEXT: b.eq LBB5_1
; CHECK-NEXT: ; %bb.2: ; %exit		; CHECK-NEXT: ; %bb.2: ; %exit
; CHECK-NEXT: ret		; CHECK-NEXT: ret
;		;
; CHECK-BE-LABEL: trunc_v8i19_to_v8i8_in_loop:		; CHECK-BE-LABEL: trunc_v8i19_to_v8i8_in_loop:
; CHECK-BE: // %bb.0: // %entry		; CHECK-BE: // %bb.0: // %entry
; CHECK-BE-NEXT: mov x8, xzr		; CHECK-BE-NEXT: mov x8, xzr
; CHECK-BE-NEXT: .LBB5_1: // %loop		; CHECK-BE-NEXT: .LBB5_1: // %loop
; CHECK-BE-NEXT: // =>This Inner Loop Header: Depth=1		; CHECK-BE-NEXT: // =>This Inner Loop Header: Depth=1
; CHECK-BE-NEXT: ldp x10, x9, [x0]		; CHECK-BE-NEXT: ldp x10, x9, [x0]
; CHECK-BE-NEXT: ldrh w15, [x0, #16]		; CHECK-BE-NEXT: ldrh w11, [x0, #16]
; CHECK-BE-NEXT: lsr x12, x10, #40
; CHECK-BE-NEXT: lsr x13, x10, #45		; CHECK-BE-NEXT: lsr x13, x10, #45
; CHECK-BE-NEXT: lsr x11, x9, #40		; CHECK-BE-NEXT: lsr x15, x10, #40
		; CHECK-BE-NEXT: lsr x12, x9, #40
; CHECK-BE-NEXT: ubfx x14, x9, #33, #7		; CHECK-BE-NEXT: ubfx x14, x9, #33, #7
; CHECK-BE-NEXT: ubfx x16, x10, #26, #14		; CHECK-BE-NEXT: ubfx x16, x10, #26, #14
; CHECK-BE-NEXT: bfi w16, w12, #14, #18		; CHECK-BE-NEXT: orr w12, w14, w12, lsl #7
; CHECK-BE-NEXT: ubfx x12, x9, #14, #18		; CHECK-BE-NEXT: ldrb w14, [x0, #18]
; CHECK-BE-NEXT: bfi w14, w11, #7, #24		; CHECK-BE-NEXT: orr w15, w16, w15, lsl #14
; CHECK-BE-NEXT: ldrb w11, [x0, #18]
; CHECK-BE-NEXT: fmov s0, w13		; CHECK-BE-NEXT: fmov s0, w13
; CHECK-BE-NEXT: add x0, x0, #32		; CHECK-BE-NEXT: add x0, x0, #32
; CHECK-BE-NEXT: fmov s1, w14		; CHECK-BE-NEXT: fmov s1, w12
; CHECK-BE-NEXT: bfi w11, w15, #8, #16		; CHECK-BE-NEXT: ubfx x12, x9, #14, #18
; CHECK-BE-NEXT: mov v0.s[1], w16		; CHECK-BE-NEXT: orr w11, w14, w11, lsl #8
		; CHECK-BE-NEXT: mov v0.s[1], w15
; CHECK-BE-NEXT: mov v1.s[1], w12		; CHECK-BE-NEXT: mov v1.s[1], w12
; CHECK-BE-NEXT: extr x12, x10, x9, #40		; CHECK-BE-NEXT: extr x12, x10, x9, #40
; CHECK-BE-NEXT: lsl x9, x9, #24		; CHECK-BE-NEXT: lsl x9, x9, #24
; CHECK-BE-NEXT: ubfx x10, x10, #7, #25		; CHECK-BE-NEXT: ubfx x10, x10, #7, #25
; CHECK-BE-NEXT: orr w9, w11, w9		; CHECK-BE-NEXT: orr w9, w11, w9
; CHECK-BE-NEXT: lsr w9, w9, #19		; CHECK-BE-NEXT: lsr w9, w9, #19
; CHECK-BE-NEXT: mov v0.s[2], w10		; CHECK-BE-NEXT: mov v0.s[2], w10
; CHECK-BE-NEXT: ubfx x10, x12, #12, #20		; CHECK-BE-NEXT: ubfx x10, x12, #12, #20
▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/urem-seteq.ll

	Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines

	define i16 @test_urem_even(i16 %X) nounwind {			define i16 @test_urem_even(i16 %X) nounwind {
	; CHECK-LABEL: test_urem_even:			; CHECK-LABEL: test_urem_even:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov w8, #28087			; CHECK-NEXT: mov w8, #28087
	; CHECK-NEXT: mul w8, w0, w8			; CHECK-NEXT: mul w8, w0, w8
	; CHECK-NEXT: and w9, w8, #0xfffc			; CHECK-NEXT: and w9, w8, #0xfffc
	; CHECK-NEXT: lsr w9, w9, #1			; CHECK-NEXT: lsr w9, w9, #1
	; CHECK-NEXT: bfi w9, w8, #15, #17			; CHECK-NEXT: orr w8, w9, w8, lsl #15
	; CHECK-NEXT: ubfx w8, w9, #1, #15			; CHECK-NEXT: ubfx w8, w8, #1, #15
	; CHECK-NEXT: cmp w8, #2340			; CHECK-NEXT: cmp w8, #2340
	; CHECK-NEXT: cset w0, hi			; CHECK-NEXT: cset w0, hi
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%urem = urem i16 %X, 14			%urem = urem i16 %X, 14
	%cmp = icmp ne i16 %urem, 0			%cmp = icmp ne i16 %urem, 0
	%ret = zext i1 %cmp to i16			%ret = zext i1 %cmp to i16
	ret i16 %ret			ret i16 %ret
	}			}
	▲ Show 20 Lines • Show All 159 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/vec_uaddo.ll

	Show First 20 Lines • Show All 243 Lines • ▼ Show 20 Lines

	define <4 x i32> @uaddo_v4i1(<4 x i1> %a0, <4 x i1> %a1, <4 x i1>* %p2) nounwind {			define <4 x i32> @uaddo_v4i1(<4 x i1> %a0, <4 x i1> %a1, <4 x i1>* %p2) nounwind {
	; CHECK-LABEL: uaddo_v4i1:			; CHECK-LABEL: uaddo_v4i1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: movi v2.4h, #1			; CHECK-NEXT: movi v2.4h, #1
	; CHECK-NEXT: and v1.8b, v1.8b, v2.8b			; CHECK-NEXT: and v1.8b, v1.8b, v2.8b
	; CHECK-NEXT: and v0.8b, v0.8b, v2.8b			; CHECK-NEXT: and v0.8b, v0.8b, v2.8b
	; CHECK-NEXT: add v0.4h, v0.4h, v1.4h			; CHECK-NEXT: add v0.4h, v0.4h, v1.4h
	; CHECK-NEXT: umov w8, v0.h[1]			; CHECK-NEXT: umov w8, v0.h[0]
	; CHECK-NEXT: umov w9, v0.h[0]			; CHECK-NEXT: umov w9, v0.h[1]
	; CHECK-NEXT: umov w10, v0.h[2]			; CHECK-NEXT: umov w10, v0.h[2]
	; CHECK-NEXT: umov w11, v0.h[3]			; CHECK-NEXT: umov w11, v0.h[3]
	; CHECK-NEXT: and v1.8b, v0.8b, v2.8b			; CHECK-NEXT: and v1.8b, v0.8b, v2.8b
	; CHECK-NEXT: cmeq v0.4h, v1.4h, v0.4h			; CHECK-NEXT: cmeq v0.4h, v1.4h, v0.4h
	; CHECK-NEXT: bfi w9, w8, #1, #1			; CHECK-NEXT: and w8, w8, #0x1
	; CHECK-NEXT: bfi w9, w10, #2, #1			; CHECK-NEXT: bfi w8, w9, #1, #1
	; CHECK-NEXT: mvn v0.8b, v0.8b			; CHECK-NEXT: mvn v0.8b, v0.8b
	; CHECK-NEXT: bfi w9, w11, #3, #29			; CHECK-NEXT: bfi w8, w10, #2, #1
	; CHECK-NEXT: and w8, w9, #0xf			; CHECK-NEXT: orr w8, w8, w11, lsl #3
				; CHECK-NEXT: and w8, w8, #0xf
	; CHECK-NEXT: sshll v0.4s, v0.4h, #0			; CHECK-NEXT: sshll v0.4s, v0.4h, #0
	; CHECK-NEXT: strb w8, [x0]			; CHECK-NEXT: strb w8, [x0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%t = call {<4 x i1>, <4 x i1>} @llvm.uadd.with.overflow.v4i1(<4 x i1> %a0, <4 x i1> %a1)			%t = call {<4 x i1>, <4 x i1>} @llvm.uadd.with.overflow.v4i1(<4 x i1> %a0, <4 x i1> %a1)
	%val = extractvalue {<4 x i1>, <4 x i1>} %t, 0			%val = extractvalue {<4 x i1>, <4 x i1>} %t, 0
	%obit = extractvalue {<4 x i1>, <4 x i1>} %t, 1			%obit = extractvalue {<4 x i1>, <4 x i1>} %t, 1
	%res = sext <4 x i1> %obit to <4 x i32>			%res = sext <4 x i1> %obit to <4 x i32>
	store <4 x i1> %val, <4 x i1>* %p2			store <4 x i1> %val, <4 x i1>* %p2
	Show All 27 Lines

llvm/test/CodeGen/AArch64/vec_umulo.ll

	Show First 20 Lines • Show All 293 Lines • ▼ Show 20 Lines
	}			}

	define <4 x i32> @umulo_v4i1(<4 x i1> %a0, <4 x i1> %a1, <4 x i1>* %p2) nounwind {			define <4 x i32> @umulo_v4i1(<4 x i1> %a0, <4 x i1> %a1, <4 x i1>* %p2) nounwind {
	; CHECK-LABEL: umulo_v4i1:			; CHECK-LABEL: umulo_v4i1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: fmov d2, d0			; CHECK-NEXT: fmov d2, d0
	; CHECK-NEXT: movi v0.2d, #0000000000000000			; CHECK-NEXT: movi v0.2d, #0000000000000000
	; CHECK-NEXT: and v1.8b, v2.8b, v1.8b			; CHECK-NEXT: and v1.8b, v2.8b, v1.8b
	; CHECK-NEXT: umov w8, v1.h[1]			; CHECK-NEXT: umov w8, v1.h[0]
	; CHECK-NEXT: umov w9, v1.h[0]			; CHECK-NEXT: umov w9, v1.h[1]
	; CHECK-NEXT: umov w10, v1.h[2]			; CHECK-NEXT: umov w10, v1.h[2]
	; CHECK-NEXT: umov w11, v1.h[3]			; CHECK-NEXT: umov w11, v1.h[3]
	; CHECK-NEXT: bfi w9, w8, #1, #1			; CHECK-NEXT: and w8, w8, #0x1
	; CHECK-NEXT: bfi w9, w10, #2, #1			; CHECK-NEXT: bfi w8, w9, #1, #1
	; CHECK-NEXT: bfi w9, w11, #3, #29			; CHECK-NEXT: bfi w8, w10, #2, #1
	; CHECK-NEXT: and w8, w9, #0xf			; CHECK-NEXT: orr w8, w8, w11, lsl #3
				; CHECK-NEXT: and w8, w8, #0xf
	; CHECK-NEXT: strb w8, [x0]			; CHECK-NEXT: strb w8, [x0]
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%t = call {<4 x i1>, <4 x i1>} @llvm.umul.with.overflow.v4i1(<4 x i1> %a0, <4 x i1> %a1)			%t = call {<4 x i1>, <4 x i1>} @llvm.umul.with.overflow.v4i1(<4 x i1> %a0, <4 x i1> %a1)
	%val = extractvalue {<4 x i1>, <4 x i1>} %t, 0			%val = extractvalue {<4 x i1>, <4 x i1>} %t, 0
	%obit = extractvalue {<4 x i1>, <4 x i1>} %t, 1			%obit = extractvalue {<4 x i1>, <4 x i1>} %t, 1
	%res = sext <4 x i1> %obit to <4 x i32>			%res = sext <4 x i1> %obit to <4 x i32>
	store <4 x i1> %val, <4 x i1>* %p2			store <4 x i1> %val, <4 x i1>* %p2
	ret <4 x i32> %res			ret <4 x i32> %res
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Compare BFI and ORR with left-shifted operand for OR instruction selection.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 473005

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

llvm/test/CodeGen/AArch64/arm64-bitfield-extract.ll

llvm/test/CodeGen/AArch64/arm64-non-pow2-ldst.ll

llvm/test/CodeGen/AArch64/arm64-strict-align.ll

llvm/test/CodeGen/AArch64/arm64_32.ll

llvm/test/CodeGen/AArch64/bfis-in-loop.ll

llvm/test/CodeGen/AArch64/bitfield-insert.ll

llvm/test/CodeGen/AArch64/build-pair-isel.ll

llvm/test/CodeGen/AArch64/funnel-shift-rot.ll

llvm/test/CodeGen/AArch64/load-combine-big-endian.ll

llvm/test/CodeGen/AArch64/load-combine.ll

llvm/test/CodeGen/AArch64/logic-shift.ll

llvm/test/CodeGen/AArch64/nontemporal-load.ll

llvm/test/CodeGen/AArch64/rotate-extract.ll

llvm/test/CodeGen/AArch64/trunc-to-tbl.ll

llvm/test/CodeGen/AArch64/urem-seteq.ll

llvm/test/CodeGen/AArch64/vec_uaddo.ll

llvm/test/CodeGen/AArch64/vec_umulo.ll

[AArch64] Compare BFI and ORR with left-shifted operand for OR instruction selection.
ClosedPublic