This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/
-
CodeGen/
1/1
CodeGenPrepare.cpp
-
SelectionDAG/
3/3
DAGCombiner.cpp
-
Target/RISCV/
-
RISCV/
-
RISCVISelLowering.h
-
test/CodeGen/
-
CodeGen/
-
AMDGPU/
-
salu-to-valu.ll
-
ARM/
-
misched-fusion-aes.ll
-
vector-spilling.ll
-
RISCV/
-
split-offsets.ll
-
SystemZ/
1/1
int-add-08.ll
3/4
int-sub-05.ll

Differential D60294

[DAGCombiner] [CodeGenPrepare] Split large offsets from base addresses
ClosedPublic

Authored by luismarques on Apr 4 2019, 4:42 PM.

Download Raw Diff

Details

Reviewers

efriedma
spatel
asb

Commits

rG2e46312ffd16: [DAGCombiner] [CodeGenPrepare] More comprehensive GEP splitting
rL363544: [DAGCombiner] [CodeGenPrepare] More comprehensive GEP splitting

Summary

This patch addresses the following issue that came up in the context of RISC-V benchmarks, but which affects other targets. Suppose you have several loads/stores that access array elements or struct fields with large offsets:

void foo(int *x, int *y) {
    y[0] = x[0x10001];
    y[1] = x[0x10002];
    y[2] = x[0x10003];
    ...
}

In a target such as RISC-V you cannot add 0x10001 to the address of X in a single instruction (the constant doesn't fit the 12-bit signed immediate), so the generated code is more directly reflected by something like this:

void foo(int *x, int *y) {
    y[0] = *(x+0x10000+1);
    y[1] = *(x+0x10000+2);
    y[2] = *(x+0x10000+3);
    ...
}

But you can fold the +1, etc. into an immediate offset of the load/store instructions, so you are able to effectively have something like this:

void foo(int *x, int *y) {
    int *base = &x[0x10001];
    y[0] = base[0];
    y[1] = base[1];
    y[2] = base[2];
    ...
}

That optimization is only able to be performed, though, if the +1, +2, etc. are split from the 0x10000. Fortunately, there is already a target hook that indicates we want such an address split to occur: shouldConsiderGEPOffsetSplit. When that hook returns true, CodeGenPrepare.cpp adds the GEPs with large offsets to a list of GEPs to be split and ::splitLargeGEPOffsets splits them, in a process clearly illustrated in that method's comments. Unfortunately, the split currently only occurs when the base and the GEP are in different BBs, since the DAGCombiner would just recombine those in the same BB anyway.

This patch intends to:

make the split also occur in the cases where the base and the GEP are in the same BB (that's often the case);
ensure that the DAGCombiner doesn't reassociate them back again.

To achieve that second step the patch adds a check before the reassociation of add instructions to see if the sum is used by loads or stores and if reassociating could break a reg+imm addressing mode for those loads/stores. This strategy seems to work, as shown in the tests.

A possible alternative would be to add a RISC-V specific pass to split the addresses, but solving this problem in a more generic fashion is probably preferable, as it avoids duplication of functionality and can benefit other targets.

(It might be possible to address https://bugs.llvm.org/show_bug.cgi?id=24447 by making the address mode checks more stringent for X86, etc.)

Diff Detail

Repository: rL LLVM

Event Timeline

luismarques created this revision.Apr 4 2019, 4:42 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2019, 4:42 PM

Herald added subscribers: llvm-commits, jocewei, PkmX and 16 others. · View Herald Transcript

luismarques edited the summary of this revision. (Show Details)Apr 4 2019, 4:45 PM

luismarques edited the summary of this revision. (Show Details)Apr 4 2019, 4:48 PM

Should we care about the number of uses of N0 here? It should generally be "safe" to reassociate in cases where it only has one use.

Given that we're splitting the GEPs before SelectionDAG, the only way to preserve the optimization is to avoid re-merging them... and I can't think of any reasonable approach for that other than something along the lines of this patch. Well, I guess we could just completely disable folding without regard for whether the addressing mode is legal, but that probably interacts badly with type legalization.

The alternative, as you note, is to do something after isel. Arguably that would be more effective. At that point you have a better idea of what constants are actually necessary, and you could integrate it with other similar optimizations like optimizing related integer constants. This is sort of along the lines of RISCVMergeBaseOffset.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1040	This computation of AccessTy is weird: it's supposed to be the type of the load, not the type of the pointer. How you get the right access type is sort of awkward, of course... I guess you could traverse the use list.

In D60294#1456609, @efriedma wrote:

Given that we're splitting the GEPs before SelectionDAG, the only way to preserve the optimization is to avoid re-merging them... and I can't think of any reasonable approach for that other than something along the lines of this patch. Well, I guess we could just completely disable folding without regard for whether the addressing mode is legal, but that probably interacts badly with type legalization.

The alternative, as you note, is to do something after isel. Arguably that would be more effective. At that point you have a better idea of what constants are actually necessary, and you could integrate it with other similar optimizations like optimizing related integer constants. This is sort of along the lines of RISCVMergeBaseOffset.

My feeling is that if there are no major objections to disabling the fold in this case (e.g. an undocumented requirement that add with constants are always canonicalised as they currently are), then it seems worth adding logic like this to disable the re-merging on the basis that 1) it's not a huge amount of code that serves to preserve work that's already done in CodeGenPrepare anyway, 2) it's easier for other backends to make use of it. I agree there may later be scope for pass that runs after ISel that might be wider in scope. But for this transformation, it seems a shame to have a situation where CodeGenPrepare does the necessary analysis+transformation, the DAGCombiner undoes that work, then a very similar analysis+transformation is done again on the SelectionDAG.

asb added inline comments.Apr 11 2019, 5:39 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1040	Traversing the use list as callers to canFoldInAddressingMode do seems sensible. This function can identify the first load/store operation and use the type from that. Bail out if none of the uses are load/stores. By doing this we can also get the right address space. It might be worth exploring if we can just call canFoldInAddressingMode rather than replicating similar logic here.

Passing-by comment: can/will this do anything to e.g. the testcase https://reviews.llvm.org/D59535#1458118 which resulted in revert if D59535 ?
The problem being solved here seems /related/.

Not sure if it's relevant to what this patch is trying to do, but x86 has a similar example described here:
https://bugs.llvm.org/show_bug.cgi?id=24447

In D60294#1462628, @spatel wrote:

Not sure if it's relevant to what this patch is trying to do, but x86 has a similar example described here:
https://bugs.llvm.org/show_bug.cgi?id=24447

Thanks for the link. This has the potential to help in the case that the large constants were produced by getelementptrs. Except the logic here and in CodeGenPrepare's splitgep routines checks isLegalAddressingMode, which may not be selective enough for X86 given its addressing modes are so unrestricted.

dmgreen added a subscriber: dmgreen.Apr 11 2019, 12:28 PM

Hello. This looks like an interesting patch. Thanks for working on it. I ran some numbers and on Thumb1 targets (where resources are generally very constrained) this looks like a nice improvement.

Things didn't look as good on Thumb2 (and AArch64), but that might be that something isn't tuned correctly, or something that's just going wrong. I'll try and take a look (in the morning). Don't feel that that should block you here, I think something odd might be going on with floating point constants in soft-fp? I'm not sure yet.

In D60294#1463517, @dmgreen wrote:

Hello. This looks like an interesting patch. Thanks for working on it. I ran some numbers and on Thumb1 targets (where resources are generally very constrained) this looks like a nice improvement.
Things didn't look as good on Thumb2 (and AArch64), but that might be that something isn't tuned correctly, or something that's just going wrong. I'll try and take a look (in the morning). Don't feel that that should block you here, I think something odd might be going on with floating point constants in soft-fp? I'm not sure yet.

My reply through the email seems to have bounced, so I'm quoting it here:
Thanks for the feedback and encouragement. I'm finishing an improved version of the patch, it would be great if you could rerun those analyses on the updated version and share if it changes anything important. The new version properly checks if the transformation would impact the loads/stores that use the constants. I can see that check having an impact on some real-world scenarios not covered by the existing tests.

The patch now checks if doing the reassociation could break any of the load/stores that use the added constants. The value type and addressing space is now obtained from the load/stores. The failing tests have been slightly reduced by these changes:

Failing Tests (5):

LLVM :: CodeGen/AMDGPU/salu-to-valu.ll
LLVM :: CodeGen/SystemZ/int-add-08.ll
LLVM :: CodeGen/SystemZ/int-sub-05.ll
LLVM :: CodeGen/ARM/misched-fusion-aes.ll
LLVM :: CodeGen/ARM/vector-spilling.ll

In D60294#1456609, @efriedma wrote:

Should we care about the number of uses of N0 here? It should generally be "safe" to reassociate in cases where it only has one use.

Let's say we are doing *(x + 0x10000 + 2) and we don't use x + 0x10000 anywhere else (that's the N0, right?). In an arch like RISC-V if we want to add 0x10002 to x that takes two instructions to materialize the constant, while 0x10000 only takes one. We can avoid that second instruction by folding the +2 into the load/store. Am I thinking about this wrong?

piotr added a subscriber: piotr.Apr 15 2019, 1:03 AM

dmgreen mentioned this in D60677: [ARM] Rewrite isLegalT2AddressImmediate.Apr 15 2019, 1:32 AM

I wasn't thinking in terms of keeping the addressing mode legal, just avoiding destroying the work of constant hoisting. Constant hoisting won't split a constant in the way you're suggesting. And it's relatively easy to write patterns to split "load x+c" in the most efficient way if "c" has a single use. I guess the more restrictive version would allow splitting single-use constants as a DAGCombine, or earlier? Not sure why you'd want to, though.

This updates the patch to:

Add a check for N0.hasOneUse(), as suggested by Eli Friedman;
Changes the shuffling indices of the ARM vector-spilling.ll test, to ensure the desired multi-register vector spills and restores are generated;
Updates the ARM misched-fusion-aes.ll AES fusion test checks to account for the new instruction scheduling. We still seem to have the desired number of fusable instructions, just in a different order.

Those two tests would remain broken even if you always gave the OK to reassociate in reassociationCanBreakAddressingModePattern. With these changes the remaining failing tests should be:

Failing Tests (3):

LLVM :: CodeGen/AMDGPU/salu-to-valu.ll
LLVM :: CodeGen/SystemZ/int-sub-05.ll
LLVM :: CodeGen/SystemZ/int-add-08.ll

I'll look more closely into those tests. If necessary we can tweak the reassociation gating for those.

Herald added a subscriber: qcolombet. · View Herald TranscriptApr 16 2019, 4:20 AM

dmgreen mentioned this in rL358845: [ARM] Rewrite isLegalT2AddressImmediate.Apr 21 2019, 2:52 AM

dmgreen mentioned this in rG0d741507f7ec: [ARM] Rewrite isLegalT2AddressImmediate.

Fix the remaining failing unit tests (for AMDGPU and SystemZ).

Herald added subscribers: nhaehnle, jvesely. · View Herald TranscriptApr 21 2019, 3:11 PM

Those test case changes represent an actual improvement here, so this looks good. Thanks!

test/CodeGen/SystemZ/int-add-08.ll
53–54	It would be preferable to keep verifying the base register here, i.e. lay [[BASE::%r[1-5]]], 524280(%1) alg {{%r[0-5]}}, 8([[BASE]])
test/CodeGen/SystemZ/int-sub-05.ll
58–59	Same here.

With rL358845 in, the remaining results I ran look good. Consider us out of your way.

Fixes a minor whitespace issue;
Updates the SystemZ tests to preserve the base register check;
Updates the summary to reflect the current patch/review status.

Thanks for the review comments!

In D60294#1476116, @dmgreen wrote:

With rL358845 in, the remaining results I ran look good. Consider us out of your way.

Great! Thanks for looking into this.

uweigand added inline comments.Apr 24 2019, 5:28 AM

test/CodeGen/SystemZ/int-sub-05.ll
58–59	Should be %r[1-5] here as well, register 0 cannot be used for address generation. Otherwise, the SystemZ changes LGTM now. Thanks!

Fix SystemZ's int-sub-05.ll test register matching.

luismarques marked 2 inline comments as done.Apr 24 2019, 5:40 AM

luismarques added inline comments.

test/CodeGen/SystemZ/int-sub-05.ll
58–59	Oops. Thanks!

uweigand added inline comments.Apr 24 2019, 12:19 PM

test/CodeGen/SystemZ/int-sub-05.ll
58–59	Perfect, thanks!

Rebased on master.
@arsenm: could you please review the AMDGPU changes? They were a bit fiddly, so you may prefer to tweak the prefixes in a different way, etc.

zzheng added a child revision: D62833: [DAGCombine][Thumb] Use single CP entry for addressing GV with large offset.Jun 3 2019, 5:17 PM

asb added inline comments.Jun 5 2019, 10:05 PM

lib/CodeGen/CodeGenPrepare.cpp
4202–4203	This comment is now out-of-date, and needs updating.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1045	Can you not check for a MemSDNode here, and avoid worrying about whether it's a LD or a ST?
test/CodeGen/RISCV/split-offsets-1.ll
3 ↗	(On Diff #201194)	Should probably check RV64 too
4 ↗	(On Diff #201194)	A comment should explain what this test is intending to check
test/CodeGen/RISCV/split-offsets-2.ll
1 ↗	(On Diff #201194)	This doesn't need to be in a separate file to the other test

asb requested changes to this revision.Jun 5 2019, 10:05 PM

This revision now requires changes to proceed.Jun 5 2019, 10:05 PM

Addresses the remaining review concerns.

luismarques marked 5 inline comments as done.Jun 7 2019, 6:44 AM

Thanks, this looks good to me.

This revision is now accepted and ready to land.Jun 7 2019, 11:56 PM

Closed by commit rL363544: [DAGCombiner] [CodeGenPrepare] More comprehensive GEP splitting (authored by luismarques). · Explain WhyJun 17 2019, 3:51 AM

This revision was automatically updated to reflect the committed changes.

craig.topper added a subscriber: craig.topper.Apr 18 2022, 4:09 PM

craig.topper added inline comments.

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1076 ↗	(On Diff #205016)	Doing some archaelogy. Should this be checking the uses of the `(add, (add, x, offset1), offset2))` expression? It seems to be checking the uses of `(add x, offset1)`.

Herald added a project: Restricted Project. · View Herald TranscriptApr 18 2022, 4:09 PM

Herald added subscribers: • pcwang-thead, luke957, StephenFan and 9 others. · View Herald Transcript

luismarques added inline comments.Apr 19 2022, 7:43 AM

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1076 ↗	(On Diff #205016)	It seems to be checking the uses of `(add x, offset1)`. Yes, you're right. Should this be checking the uses of the `(add, (add, x, offset1), offset2))` expression? Yes, I think that would be the actually correct thing to do. It seems that, because the `(add x, offset1)` was also being used in the loads/stores, in practice this was working well enough for the test coverage we had (not sure about the real world). In fact, I think if you change it to check the uses of the proper value you won't see any changes at all for all CodeGen tests, for all targets. If you're planning to fix this, can you see if you can add a test case that shows the difference in behavior? Thanks for spotting this issue, Craig! Sorry about the bug.

craig.topper mentioned this in D124644: [DAGCombiner] reassociationCanBreakAddressingModePattern should check uses of the outer add..Apr 28 2022, 4:24 PM

craig.topper mentioned this in rG5f057eaa0ddf: [DAGCombiner] reassociationCanBreakAddressingModePattern should check uses of….May 2 2022, 4:43 PM

Revision Contents

Path

Size

lib/

CodeGen/

CodeGenPrepare.cpp

13 lines

SelectionDAG/

DAGCombiner.cpp

66 lines

Target/

RISCV/

RISCVISelLowering.h

1 line

test/

CodeGen/

AMDGPU/

salu-to-valu.ll

30 lines

ARM/

misched-fusion-aes.ll

17 lines

vector-spilling.ll

4 lines

RISCV/

split-offsets.ll

122 lines

SystemZ/

int-add-08.ll

10 lines

int-sub-05.ll

10 lines

Diff 203550

lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,193 Lines • ▼ Show 20 Lines	if (VariableOperand == -1) {
// event that the offset cannot fit into the r+i addressing mode.		// event that the offset cannot fit into the r+i addressing mode.
// Simple and common case that only one GEP is used in calculating the		// Simple and common case that only one GEP is used in calculating the
// address for the memory access.		// address for the memory access.
Value *Base = AddrInst->getOperand(0);		Value *Base = AddrInst->getOperand(0);
auto *BaseI = dyn_cast<Instruction>(Base);		auto *BaseI = dyn_cast<Instruction>(Base);
auto *GEP = cast<GetElementPtrInst>(AddrInst);		auto *GEP = cast<GetElementPtrInst>(AddrInst);
if (isa<Argument>(Base) \|\| isa<GlobalValue>(Base) \|\|		if (isa<Argument>(Base) \|\| isa<GlobalValue>(Base) \|\|
(BaseI && !isa<CastInst>(BaseI) &&		(BaseI && !isa<CastInst>(BaseI) &&
!isa<GetElementPtrInst>(BaseI))) {		!isa<GetElementPtrInst>(BaseI))) {
// If the base is an instruction, make sure the GEP is not in the same		// Make sure the parent block allows inserting non-PHI instructions
		asbUnsubmitted Done Reply Inline Actions This comment is now out-of-date, and needs updating. asb: This comment is now out-of-date, and needs updating.
// basic block as the base. If the base is an argument or global		// before the terminator.
// value, make sure the GEP is not in the entry block. Otherwise,
// instruction selection can undo the split. Also make sure the
// parent block allows inserting non-PHI instructions before the
// terminator.
BasicBlock *Parent =		BasicBlock *Parent =
BaseI ? BaseI->getParent() : &GEP->getFunction()->getEntryBlock();		BaseI ? BaseI->getParent() : &GEP->getFunction()->getEntryBlock();
if (GEP->getParent() != Parent && !Parent->getTerminator()->isEHPad())		if (!Parent->getTerminator()->isEHPad())
LargeOffsetGEP = std::make_pair(GEP, ConstantOffset);		LargeOffsetGEP = std::make_pair(GEP, ConstantOffset);
}		}
}		}
AddrMode.BaseOffs -= ConstantOffset;		AddrMode.BaseOffs -= ConstantOffset;
return false;		return false;
}		}

// Save the valid addressing mode in case we can't match.		// Save the valid addressing mode in case we can't match.
▲ Show 20 Lines • Show All 513 Lines • ▼ Show 20 Lines	while (!worklist.empty()) {
AddrModeInsts.clear();		AddrModeInsts.clear();
std::pair<AssertingVH<GetElementPtrInst>, int64_t> LargeOffsetGEP(nullptr,		std::pair<AssertingVH<GetElementPtrInst>, int64_t> LargeOffsetGEP(nullptr,
0);		0);
ExtAddrMode NewAddrMode = AddressingModeMatcher::Match(		ExtAddrMode NewAddrMode = AddressingModeMatcher::Match(
V, AccessTy, AddrSpace, MemoryInst, AddrModeInsts, TLI, TRI,		V, AccessTy, AddrSpace, MemoryInst, AddrModeInsts, TLI, TRI,
InsertedInsts, PromotedInsts, TPT, LargeOffsetGEP);		InsertedInsts, PromotedInsts, TPT, LargeOffsetGEP);

GetElementPtrInst *GEP = LargeOffsetGEP.first;		GetElementPtrInst *GEP = LargeOffsetGEP.first;
if (GEP && GEP->getParent() != MemoryInst->getParent() &&		if (GEP && !NewGEPBases.count(GEP)) {
!NewGEPBases.count(GEP)) {
// If splitting the underlying data structure can reduce the offset of a		// If splitting the underlying data structure can reduce the offset of a
// GEP, collect the GEP. Skip the GEPs that are the new bases of		// GEP, collect the GEP. Skip the GEPs that are the new bases of
// previously split data structures.		// previously split data structures.
LargeOffsetGEPMap[GEP->getPointerOperand()].push_back(LargeOffsetGEP);		LargeOffsetGEPMap[GEP->getPointerOperand()].push_back(LargeOffsetGEP);
if (LargeOffsetGEPID.find(GEP) == LargeOffsetGEPID.end())		if (LargeOffsetGEPID.find(GEP) == LargeOffsetGEPID.end())
LargeOffsetGEPID[GEP] = LargeOffsetGEPID.size();		LargeOffsetGEPID[GEP] = LargeOffsetGEPID.size();
}		}

▲ Show 20 Lines • Show All 2,565 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 456 Lines • ▼ Show 20 Lines	private:
SDValue visitFP16_TO_FP(SDNode *N);		SDValue visitFP16_TO_FP(SDNode *N);
SDValue visitVECREDUCE(SDNode *N);		SDValue visitVECREDUCE(SDNode *N);

SDValue visitFADDForFMACombine(SDNode *N);		SDValue visitFADDForFMACombine(SDNode *N);
SDValue visitFSUBForFMACombine(SDNode *N);		SDValue visitFSUBForFMACombine(SDNode *N);
SDValue visitFMULForFMADistributiveCombine(SDNode *N);		SDValue visitFMULForFMADistributiveCombine(SDNode *N);

SDValue XformToShuffleWithZero(SDNode *N);		SDValue XformToShuffleWithZero(SDNode *N);
		bool reassociationCanBreakAddressingModePattern(unsigned Opc,
		const SDLoc &DL, SDValue N0,
		SDValue N1);
SDValue reassociateOpsCommutative(unsigned Opc, const SDLoc &DL, SDValue N0,		SDValue reassociateOpsCommutative(unsigned Opc, const SDLoc &DL, SDValue N0,
SDValue N1);		SDValue N1);
SDValue reassociateOps(unsigned Opc, const SDLoc &DL, SDValue N0,		SDValue reassociateOps(unsigned Opc, const SDLoc &DL, SDValue N0,
SDValue N1, SDNodeFlags Flags);		SDValue N1, SDNodeFlags Flags);

SDValue visitShiftByConstant(SDNode N, ConstantSDNode Amt);		SDValue visitShiftByConstant(SDNode N, ConstantSDNode Amt);

SDValue foldSelectOfConstants(SDNode *N);		SDValue foldSelectOfConstants(SDNode *N);
▲ Show 20 Lines • Show All 529 Lines • ▼ Show 20 Lines
// undef's.		// undef's.
static bool isAnyConstantBuildVector(SDValue V, bool NoOpaques = false) {		static bool isAnyConstantBuildVector(SDValue V, bool NoOpaques = false) {
if (V.getOpcode() != ISD::BUILD_VECTOR)		if (V.getOpcode() != ISD::BUILD_VECTOR)
return false;		return false;
return isConstantOrConstantVector(V, NoOpaques) \|\|		return isConstantOrConstantVector(V, NoOpaques) \|\|
ISD::isBuildVectorOfConstantFPSDNodes(V.getNode());		ISD::isBuildVectorOfConstantFPSDNodes(V.getNode());
}		}

		bool DAGCombiner::reassociationCanBreakAddressingModePattern(unsigned Opc,
		const SDLoc &DL,
		SDValue N0,
		SDValue N1) {
		// Currently this only tries to ensure we don't undo the GEP splits done by
		// CodeGenPrepare when shouldConsiderGEPOffsetSplit is true. To ensure this,
		// we check if the following transformation would be problematic:
		// (load/store (add, (add, x, offset1), offset2)) ->
		// (load/store (add, x, offset1+offset2)).

		if (Opc != ISD::ADD \|\| N0.getOpcode() != ISD::ADD)
		return false;

		if (N0.hasOneUse())
		return false;

		auto *C1 = dyn_cast<ConstantSDNode>(N0.getOperand(1));
		auto *C2 = dyn_cast<ConstantSDNode>(N1);
		if (!C1 \|\| !C2)
		return false;

		const APInt &C1APIntVal = C1->getAPIntValue();
		const APInt &C2APIntVal = C2->getAPIntValue();
		if (C1APIntVal.getBitWidth() > 64 \|\| C2APIntVal.getBitWidth() > 64)
		return false;

		const APInt CombinedValueIntVal = C1APIntVal + C2APIntVal;
		if (CombinedValueIntVal.getBitWidth() > 64)
		efriedmaUnsubmitted Done Reply Inline Actions This computation of AccessTy is weird: it's supposed to be the type of the load, not the type of the pointer. How you get the right access type is sort of awkward, of course... I guess you could traverse the use list. efriedma: This computation of AccessTy is weird: it's supposed to be the type of the load, not the type…
		asbUnsubmitted Done Reply Inline Actions Traversing the use list as callers to canFoldInAddressingMode do seems sensible. This function can identify the first load/store operation and use the type from that. Bail out if none of the uses are load/stores. By doing this we can also get the right address space. It might be worth exploring if we can just call canFoldInAddressingMode rather than replicating similar logic here. asb: Traversing the use list as callers to canFoldInAddressingMode do seems sensible. This function…
		return false;
		const int64_t CombinedValue = CombinedValueIntVal.getSExtValue();

		for (SDNode *Node : N0->uses()) {
		auto LoadStore = dyn_cast<MemSDNode>(Node);
		asbUnsubmitted Done Reply Inline Actions Can you not check for a MemSDNode here, and avoid worrying about whether it's a LD or a ST? asb: Can you not check for a MemSDNode here, and avoid worrying about whether it's a LD or a ST?
		if (LoadStore) {
		// Is x[offset2] already not a legal addressing mode? If so then
		// reassociating the constants breaks nothing (we test offset2 because
		// that's the one we hope to fold into the load or store).
		TargetLoweringBase::AddrMode AM;
		AM.HasBaseReg = true;
		AM.BaseOffs = C2APIntVal.getSExtValue();
		EVT VT = LoadStore->getMemoryVT();
		unsigned AS = LoadStore->getAddressSpace();
		Type AccessTy = VT.getTypeForEVT(DAG.getContext());
		if (!TLI.isLegalAddressingMode(DAG.getDataLayout(), AM, AccessTy, AS))
		continue;

		// Would x[offset1+offset2] still be a legal addressing mode?
		AM.BaseOffs = CombinedValue;
		if (!TLI.isLegalAddressingMode(DAG.getDataLayout(), AM, AccessTy, AS))
		return true;
		}
		}

		return false;
		}

// Helper for DAGCombiner::reassociateOps. Try to reassociate an expression		// Helper for DAGCombiner::reassociateOps. Try to reassociate an expression
// such as (Opc N0, N1), if \p N0 is the same kind of operation as \p Opc.		// such as (Opc N0, N1), if \p N0 is the same kind of operation as \p Opc.
SDValue DAGCombiner::reassociateOpsCommutative(unsigned Opc, const SDLoc &DL,		SDValue DAGCombiner::reassociateOpsCommutative(unsigned Opc, const SDLoc &DL,
SDValue N0, SDValue N1) {		SDValue N0, SDValue N1) {
EVT VT = N0.getValueType();		EVT VT = N0.getValueType();

if (N0.getOpcode() != Opc)		if (N0.getOpcode() != Opc)
return SDValue();		return SDValue();
▲ Show 20 Lines • Show All 1,205 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::OR &&
return DAG.getNode(ISD::ADD, DL, VT, N0.getOperand(0), Add0);		return DAG.getNode(ISD::ADD, DL, VT, N0.getOperand(0), Add0);
}		}
}		}

if (SDValue NewSel = foldBinOpIntoSelect(N))		if (SDValue NewSel = foldBinOpIntoSelect(N))
return NewSel;		return NewSel;

// reassociate add		// reassociate add
		if (!reassociationCanBreakAddressingModePattern(ISD::ADD, DL, N0, N1)) {
if (SDValue RADD = reassociateOps(ISD::ADD, DL, N0, N1, N->getFlags()))		if (SDValue RADD = reassociateOps(ISD::ADD, DL, N0, N1, N->getFlags()))
return RADD;		return RADD;
		}
// fold ((0-A) + B) -> B-A		// fold ((0-A) + B) -> B-A
if (N0.getOpcode() == ISD::SUB && isNullOrNullSplat(N0.getOperand(0)))		if (N0.getOpcode() == ISD::SUB && isNullOrNullSplat(N0.getOperand(0)))
return DAG.getNode(ISD::SUB, DL, VT, N1, N0.getOperand(1));		return DAG.getNode(ISD::SUB, DL, VT, N1, N0.getOperand(1));

// fold (A + (0-B)) -> A-B		// fold (A + (0-B)) -> A-B
if (N1.getOpcode() == ISD::SUB && isNullOrNullSplat(N1.getOperand(0)))		if (N1.getOpcode() == ISD::SUB && isNullOrNullSplat(N1.getOperand(0)))
return DAG.getNode(ISD::SUB, DL, VT, N0, N1.getOperand(1));		return DAG.getNode(ISD::SUB, DL, VT, N0, N1.getOperand(1));

▲ Show 20 Lines • Show All 18,168 Lines • Show Last 20 Lines

lib/Target/RISCV/RISCVISelLowering.h

Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	private:
bool shouldConvertConstantLoadToIntImm(const APInt &Imm,		bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
Type *Ty) const override {		Type *Ty) const override {
return true;		return true;
}		}

template <class NodeTy>		template <class NodeTy>
SDValue getAddr(NodeTy *N, SelectionDAG &DAG) const;		SDValue getAddr(NodeTy *N, SelectionDAG &DAG) const;

		bool shouldConsiderGEPOffsetSplit() const override { return true; }
SDValue lowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerBlockAddress(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerBlockAddress(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerConstantPool(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerConstantPool(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerSELECT(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerSELECT(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerVASTART(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerVASTART(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerFRAMEADDR(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerFRAMEADDR(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerRETURNADDR(SDValue Op, SelectionDAG &DAG) const;
SDValue lowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;		SDValue lowerShiftLeftParts(SDValue Op, SelectionDAG &DAG) const;
Show All 22 Lines

test/CodeGen/AMDGPU/salu-to-valu.ll

; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-NOHSA -check-prefix=SI %s		; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-NOHSA -check-prefix=SI %s
; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-NOHSA -check-prefix=CI %s		; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCN-NOHSA -check-prefix=CI -check-prefix=CI-NOHSA %s
; RUN: llc -amdgpu-scalarize-global-loads=false -mtriple=amdgcn--amdhsa -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=CI --check-prefix=GCN-HSA %s		; RUN: llc -amdgpu-scalarize-global-loads=false -mtriple=amdgcn--amdhsa -mcpu=bonaire -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=CI --check-prefix=GCN-HSA %s

declare i32 @llvm.amdgcn.workitem.id.x() #0		declare i32 @llvm.amdgcn.workitem.id.x() #0
declare i32 @llvm.amdgcn.workitem.id.y() #0		declare i32 @llvm.amdgcn.workitem.id.y() #0

; In this test both the pointer and the offset operands to the		; In this test both the pointer and the offset operands to the
; BUFFER_LOAD instructions end up being stored in vgprs. This		; BUFFER_LOAD instructions end up being stored in vgprs. This
; requires us to add the pointer and offset together, store the		; requires us to add the pointer and offset together, store the
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
}		}

; Original scalar load uses SGPR offset on SI and 32-bit literal on		; Original scalar load uses SGPR offset on SI and 32-bit literal on
; CI.		; CI.

; GCN-LABEL: {{^}}smrd_valu_ci_offset_x8:		; GCN-LABEL: {{^}}smrd_valu_ci_offset_x8:
; GCN-NOHSA: s_mov_b32 [[OFFSET0:s[0-9]+]], 0x9a40{{$}}		; GCN-NOHSA: s_mov_b32 [[OFFSET0:s[0-9]+]], 0x9a40{{$}}
; GCN-NOHSA-NOT: v_add		; GCN-NOHSA-NOT: v_add
; GCN-NOHSA: s_mov_b32 [[OFFSET1:s[0-9]+]], 0x9a50{{$}}		; CI-NOHSA: s_mov_b32 [[OFFSET1:s[0-9]+]], 0x9a50{{$}}
; GCN-NOHSA-NOT: v_add		; CI-NOHSA-NOT: v_add
; GCN-NOHSA: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET1]] addr64{{$}}		; SI: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], 0 addr64 offset:16
		; CI-NOHSA: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET1]] addr64{{$}}
; GCN-NOHSA: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET0]] addr64{{$}}		; GCN-NOHSA: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET0]] addr64{{$}}

; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
Show All 11 Lines	entry:
%tmp4 = load <8 x i32>, <8 x i32> addrspace(4)* %tmp3		%tmp4 = load <8 x i32>, <8 x i32> addrspace(4)* %tmp3
%tmp5 = or <8 x i32> %tmp4, %c		%tmp5 = or <8 x i32> %tmp4, %c
store <8 x i32> %tmp5, <8 x i32> addrspace(1)* %out		store <8 x i32> %tmp5, <8 x i32> addrspace(1)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}smrd_valu_ci_offset_x16:		; GCN-LABEL: {{^}}smrd_valu_ci_offset_x16:

; GCN-NOHSA-DAG: s_mov_b32 [[OFFSET0:s[0-9]+]], 0x13480{{$}}		; SI: s_mov_b32 {{s[0-9]+}}, 0x13480
; GCN-NOHSA-DAG: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET0]] addr64{{$}}		; SI: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], 0 addr64 offset:16
; GCN-NOHSA-DAG: s_mov_b32 [[OFFSET1:s[0-9]+]], 0x13490{{$}}		; SI: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], 0 addr64 offset:32
; GCN-NOHSA-DAG: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET1]] addr64{{$}}		; SI: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], 0 addr64 offset:48
; GCN-NOHSA-DAG: s_mov_b32 [[OFFSET2:s[0-9]+]], 0x134a0{{$}}		; SI: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], {{s[0-9]+}} addr64
; GCN-NOHSA-DAG: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET2]] addr64{{$}}		; CI-NOHSA-DAG: s_mov_b32 [[OFFSET0:s[0-9]+]], 0x13480{{$}}
; GCN-NOHSA-DAG: s_mov_b32 [[OFFSET3:s[0-9]+]], 0x134b0{{$}}		; CI-NOHSA-DAG: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET0]] addr64{{$}}
; GCN-NOHSA-DAG: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET3]] addr64{{$}}		; CI-NOHSA-DAG: s_mov_b32 [[OFFSET1:s[0-9]+]], 0x13490{{$}}
		; CI-NOHSA-DAG: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET1]] addr64{{$}}
		; CI-NOHSA-DAG: s_mov_b32 [[OFFSET2:s[0-9]+]], 0x134a0{{$}}
		; CI-NOHSA-DAG: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET2]] addr64{{$}}
		; CI-NOHSA-DAG: s_mov_b32 [[OFFSET3:s[0-9]+]], 0x134b0{{$}}
		; CI-NOHSA-DAG: buffer_load_dwordx4 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, s[{{[0-9]+:[0-9]+}}], [[OFFSET3]] addr64{{$}}

; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; GCN-NOHSA: v_or_b32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
▲ Show 20 Lines • Show All 288 Lines • Show Last 20 Lines

test/CodeGen/ARM/misched-fusion-aes.ll

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines

; CHECK-LABEL: aesea:		; CHECK-LABEL: aesea:
; CHECK: aese.8 [[QA:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aese.8 [[QA:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QA]]		; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QA]]

; CHECK: aese.8 [[QB:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aese.8 [[QB:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QB]]		; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QB]]

		; CHECK: aese.8 {{q[0-9][0-9]?}}, {{q[0-9][0-9]?}}
; CHECK: aese.8 [[QC:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aese.8 [[QC:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QC]]		; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QC]]

; CHECK: aese.8 {{q[0-9][0-9]?}}, {{q[0-9][0-9]?}}
; CHECK: aese.8 [[QD:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aese.8 [[QD:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QD]]		; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QD]]

; CHECK: aese.8 {{q[0-9][0-9]?}}, {{q[0-9][0-9]?}}
; CHECK: aese.8 [[QE:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aese.8 [[QE:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QE]]		; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QE]]

; CHECK: aese.8 [[QF:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aese.8 [[QF:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QF]]		; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QF]]

		; CHECK: aese.8 {{q[0-9][0-9]?}}, {{q[0-9][0-9]?}}
; CHECK: aese.8 [[QG:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aese.8 [[QG:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QG]]		; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QG]]

; CHECK: aese.8 {{q[0-9][0-9]?}}, {{q[0-9][0-9]?}}		; CHECK: aese.8 {{q[0-9][0-9]?}}, {{q[0-9][0-9]?}}

; CHECK: aese.8 [[QH:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aese.8 [[QH:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QH]]		; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QH]]
}		}

define void @aesda(<16 x i8>* %a0, <16 x i8>* %b0, <16 x i8>* %c0, <16 x i8> %d, <16 x i8> %e) {		define void @aesda(<16 x i8>* %a0, <16 x i8>* %b0, <16 x i8>* %c0, <16 x i8> %d, <16 x i8> %e) {
%d0 = load <16 x i8>, <16 x i8>* %a0		%d0 = load <16 x i8>, <16 x i8>* %a0
%a1 = getelementptr inbounds <16 x i8>, <16 x i8>* %a0, i64 1		%a1 = getelementptr inbounds <16 x i8>, <16 x i8>* %a0, i64 1
%d1 = load <16 x i8>, <16 x i8>* %a1		%d1 = load <16 x i8>, <16 x i8>* %a1
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	define void @aesda(<16 x i8>* %a0, <16 x i8>* %b0, <16 x i8>* %c0, <16 x i8> %d, <16 x i8> %e) {
store <16 x i8> %h2, <16 x i8>* %c2		store <16 x i8> %h2, <16 x i8>* %c2
%c3 = getelementptr inbounds <16 x i8>, <16 x i8>* %c0, i64 3		%c3 = getelementptr inbounds <16 x i8>, <16 x i8>* %c0, i64 3
store <16 x i8> %h3, <16 x i8>* %c3		store <16 x i8> %h3, <16 x i8>* %c3
ret void		ret void

; CHECK-LABEL: aesda:		; CHECK-LABEL: aesda:
; CHECK: aesd.8 [[QA:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aesd.8 [[QA:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesimc.8 {{q[0-9][0-9]?}}, [[QA]]		; CHECK-NEXT: aesimc.8 {{q[0-9][0-9]?}}, [[QA]]

; CHECK: aesd.8 [[QB:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aesd.8 [[QB:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesimc.8 {{q[0-9][0-9]?}}, [[QB]]		; CHECK-NEXT: aesimc.8 {{q[0-9][0-9]?}}, [[QB]]

		; CHECK: aesd.8 {{q[0-9][0-9]?}}, {{q[0-9][0-9]?}}
; CHECK: aesd.8 [[QC:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aesd.8 [[QC:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesimc.8 {{q[0-9][0-9]?}}, [[QC]]		; CHECK-NEXT: aesimc.8 {{q[0-9][0-9]?}}, [[QC]]
; CHECK: aesd.8 {{q[0-9][0-9]?}}, {{q[0-9][0-9]?}}
; CHECK: aesd.8 [[QD:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aesd.8 [[QD:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesimc.8 {{q[0-9][0-9]?}}, [[QD]]		; CHECK-NEXT: aesimc.8 {{q[0-9][0-9]?}}, [[QD]]
; CHECK: aesd.8 {{q[0-9][0-9]?}}, {{q[0-9][0-9]?}}
; CHECK: aesd.8 [[QE:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aesd.8 [[QE:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesimc.8 {{q[0-9][0-9]?}}, [[QE]]		; CHECK-NEXT: aesimc.8 {{q[0-9][0-9]?}}, [[QE]]

; CHECK: aesd.8 [[QF:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aesd.8 [[QF:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesimc.8 {{q[0-9][0-9]?}}, [[QF]]		; CHECK-NEXT: aesimc.8 {{q[0-9][0-9]?}}, [[QF]]

		; CHECK: aesd.8 {{q[0-9][0-9]?}}, {{q[0-9][0-9]?}}
; CHECK: aesd.8 [[QG:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aesd.8 [[QG:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesimc.8 {{q[0-9][0-9]?}}, [[QG]]		; CHECK-NEXT: aesimc.8 {{q[0-9][0-9]?}}, [[QG]]

; CHECK: aesd.8 {{q[0-9][0-9]?}}, {{q[0-9][0-9]?}}		; CHECK: aesd.8 {{q[0-9][0-9]?}}, {{q[0-9][0-9]?}}
; CHECK: aesd.8 [[QH:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aesd.8 [[QH:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesimc.8 {{q[0-9][0-9]?}}, [[QH]]		; CHECK-NEXT: aesimc.8 {{q[0-9][0-9]?}}, [[QH]]
}		}

define void @aes_load_store(<16 x i8> %p1, <16 x i8> %p2 , <16 x i8> *%p3) {		define void @aes_load_store(<16 x i8> %p1, <16 x i8> %p2 , <16 x i8> *%p3) {
entry:		entry:
%x1 = alloca <16 x i8>, align 16		%x1 = alloca <16 x i8>, align 16
Show All 12 Lines	entry:
store <16 x i8> %aese2, <16 x i8>* %x4, align 16		store <16 x i8> %aese2, <16 x i8>* %x4, align 16
%aesmc2= call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %aese2) #2		%aesmc2= call <16 x i8> @llvm.arm.neon.aesmc(<16 x i8> %aese2) #2
store <16 x i8> %aesmc2, <16 x i8>* %x5, align 16		store <16 x i8> %aesmc2, <16 x i8>* %x5, align 16
ret void		ret void

; CHECK-LABEL: aes_load_store:		; CHECK-LABEL: aes_load_store:
; CHECK: aese.8 [[QA:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aese.8 [[QA:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QA]]		; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QA]]

; CHECK: aese.8 [[QB:q[0-9][0-9]?]], {{q[0-9][0-9]?}}		; CHECK: aese.8 [[QB:q[0-9][0-9]?]], {{q[0-9][0-9]?}}
; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QB]]		; CHECK-NEXT: aesmc.8 {{q[0-9][0-9]?}}, [[QB]]
}		}

test/CodeGen/ARM/vector-spilling.ll

Show All 16 Lines	entry:
%3 = load <8 x i64>, <8 x i64>* %2, align 8		%3 = load <8 x i64>, <8 x i64>* %2, align 8

%4 = getelementptr inbounds <8 x i64>, <8 x i64>* %src, i32 2		%4 = getelementptr inbounds <8 x i64>, <8 x i64>* %src, i32 2
%5 = load <8 x i64>, <8 x i64>* %4, align 8		%5 = load <8 x i64>, <8 x i64>* %4, align 8

%6 = getelementptr inbounds <8 x i64>, <8 x i64>* %src, i32 3		%6 = getelementptr inbounds <8 x i64>, <8 x i64>* %src, i32 3
%7 = load <8 x i64>, <8 x i64>* %6, align 8		%7 = load <8 x i64>, <8 x i64>* %6, align 8

%8 = shufflevector <8 x i64> %1, <8 x i64> %3, <8 x i32> <i32 0, i32 8, i32 1, i32 9, i32 2, i32 10, i32 3, i32 11>		%8 = shufflevector <8 x i64> %1, <8 x i64> %3, <8 x i32> <i32 12, i32 4, i32 15, i32 14, i32 8, i32 13, i32 2, i32 9>
%9 = shufflevector <8 x i64> %1, <8 x i64> %3, <8 x i32> <i32 4, i32 12, i32 5, i32 13, i32 6, i32 14, i32 7, i32 15>		%9 = shufflevector <8 x i64> %1, <8 x i64> %3, <8 x i32> <i32 1, i32 0, i32 3, i32 10, i32 5, i32 11, i32 7, i32 6>

tail call void(<8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>) @foo(<8 x i64> %1, <8 x i64> %3, <8 x i64> %5, <8 x i64> %7, <8 x i64> %8, <8 x i64> %9)		tail call void(<8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>) @foo(<8 x i64> %1, <8 x i64> %3, <8 x i64> %5, <8 x i64> %7, <8 x i64> %8, <8 x i64> %9)
ret void		ret void
}		}

declare void @foo(<8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>)		declare void @foo(<8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>, <8 x i64>)

attributes #0 = { noredzone "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-frame-pointer-elim-non-leaf"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }		attributes #0 = { noredzone "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-frame-pointer-elim-non-leaf"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }

test/CodeGen/RISCV/split-offsets.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
				; RUN: \| FileCheck %s -check-prefix=RV32I
				; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
				; RUN: \| FileCheck %s -check-prefix=RV64I

				; Check that memory accesses to array elements with large offsets have those
				; offsets split into a base offset, plus a smaller offset that is folded into
				; the memory operation. We should also only compute that base offset once,
				; since it can be shared for all memory operations in this test.
				define void @test1([65536 x i32]** %sp, [65536 x i32]* %t, i32 %n) {
				; RV32I-LABEL: test1:
				; RV32I: # %bb.0: # %entry
				; RV32I-NEXT: lui a2, 20
				; RV32I-NEXT: addi a2, a2, -1920
				; RV32I-NEXT: lw a0, 0(a0)
				; RV32I-NEXT: add a0, a0, a2
				; RV32I-NEXT: addi a3, zero, 1
				; RV32I-NEXT: sw a3, 4(a0)
				; RV32I-NEXT: addi a4, zero, 2
				; RV32I-NEXT: sw a4, 0(a0)
				; RV32I-NEXT: add a0, a1, a2
				; RV32I-NEXT: sw a4, 4(a0)
				; RV32I-NEXT: sw a3, 0(a0)
				; RV32I-NEXT: ret
				;
				; RV64I-LABEL: test1:
				; RV64I: # %bb.0: # %entry
				; RV64I-NEXT: lui a2, 20
				; RV64I-NEXT: addiw a2, a2, -1920
				; RV64I-NEXT: ld a0, 0(a0)
				; RV64I-NEXT: add a0, a0, a2
				; RV64I-NEXT: addi a3, zero, 1
				; RV64I-NEXT: sw a3, 4(a0)
				; RV64I-NEXT: addi a4, zero, 2
				; RV64I-NEXT: sw a4, 0(a0)
				; RV64I-NEXT: add a0, a1, a2
				; RV64I-NEXT: sw a4, 4(a0)
				; RV64I-NEXT: sw a3, 0(a0)
				; RV64I-NEXT: ret
				entry:
				%s = load [65536 x i32], [65536 x i32]* %sp
				%gep0 = getelementptr [65536 x i32], [65536 x i32]* %s, i64 0, i32 20000
				%gep1 = getelementptr [65536 x i32], [65536 x i32]* %s, i64 0, i32 20001
				%gep2 = getelementptr [65536 x i32], [65536 x i32]* %t, i64 0, i32 20000
				%gep3 = getelementptr [65536 x i32], [65536 x i32]* %t, i64 0, i32 20001
				store i32 2, i32* %gep0
				store i32 1, i32* %gep1
				store i32 1, i32* %gep2
				store i32 2, i32* %gep3
				ret void
				}

				; Ditto. Check it when the GEPs are not in the entry block.
				define void @test2([65536 x i32]** %sp, [65536 x i32]* %t, i32 %n) {
				; RV32I-LABEL: test2:
				; RV32I: # %bb.0: # %entry
				; RV32I-NEXT: lui a3, 20
				; RV32I-NEXT: addi a3, a3, -1920
				; RV32I-NEXT: lw a0, 0(a0)
				; RV32I-NEXT: add a0, a0, a3
				; RV32I-NEXT: add a1, a1, a3
				; RV32I-NEXT: mv a3, zero
				; RV32I-NEXT: bge a3, a2, .LBB1_2
				; RV32I-NEXT: .LBB1_1: # %while_body
				; RV32I-NEXT: # =>This Inner Loop Header: Depth=1
				; RV32I-NEXT: sw a3, 4(a0)
				; RV32I-NEXT: addi a4, a3, 1
				; RV32I-NEXT: sw a4, 0(a0)
				; RV32I-NEXT: sw a3, 4(a1)
				; RV32I-NEXT: sw a4, 0(a1)
				; RV32I-NEXT: mv a3, a4
				; RV32I-NEXT: blt a3, a2, .LBB1_1
				; RV32I-NEXT: .LBB1_2: # %while_end
				; RV32I-NEXT: ret
				;
				; RV64I-LABEL: test2:
				; RV64I: # %bb.0: # %entry
				; RV64I-NEXT: lui a3, 20
				; RV64I-NEXT: addiw a3, a3, -1920
				; RV64I-NEXT: ld a0, 0(a0)
				; RV64I-NEXT: add a0, a0, a3
				; RV64I-NEXT: add a1, a1, a3
				; RV64I-NEXT: sext.w a2, a2
				; RV64I-NEXT: mv a3, zero
				; RV64I-NEXT: sext.w a4, a3
				; RV64I-NEXT: bge a4, a2, .LBB1_2
				; RV64I-NEXT: .LBB1_1: # %while_body
				; RV64I-NEXT: # =>This Inner Loop Header: Depth=1
				; RV64I-NEXT: sw a3, 4(a0)
				; RV64I-NEXT: addi a4, a3, 1
				; RV64I-NEXT: sw a4, 0(a0)
				; RV64I-NEXT: sw a3, 4(a1)
				; RV64I-NEXT: sw a4, 0(a1)
				; RV64I-NEXT: mv a3, a4
				; RV64I-NEXT: sext.w a4, a3
				; RV64I-NEXT: blt a4, a2, .LBB1_1
				; RV64I-NEXT: .LBB1_2: # %while_end
				; RV64I-NEXT: ret
				entry:
				%s = load [65536 x i32], [65536 x i32]* %sp
				br label %while_cond
				while_cond:
				%phi = phi i32 [ 0, %entry ], [ %i, %while_body ]
				%gep0 = getelementptr [65536 x i32], [65536 x i32]* %s, i64 0, i32 20000
				%gep1 = getelementptr [65536 x i32], [65536 x i32]* %s, i64 0, i32 20001
				%gep2 = getelementptr [65536 x i32], [65536 x i32]* %t, i64 0, i32 20000
				%gep3 = getelementptr [65536 x i32], [65536 x i32]* %t, i64 0, i32 20001
				%cmp = icmp slt i32 %phi, %n
				br i1 %cmp, label %while_body, label %while_end
				while_body:
				%i = add i32 %phi, 1
				%j = add i32 %phi, 2
				store i32 %i, i32* %gep0
				store i32 %phi, i32* %gep1
				store i32 %i, i32* %gep2
				store i32 %phi, i32* %gep3
				br label %while_cond
				while_end:
				ret void
				}

test/CodeGen/SystemZ/int-add-08.ll

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	; CHECK: br %r14
%add = add i128 %a, %b		%add = add i128 %a, %b
store i128 %add, i128 *%aptr		store i128 %add, i128 *%aptr
ret void		ret void
}		}

; Test the next doubleword up, which requires separate address logic for ALG.		; Test the next doubleword up, which requires separate address logic for ALG.
define void @f4(i128 *%aptr, i64 %base) {		define void @f4(i128 *%aptr, i64 %base) {
; CHECK-LABEL: f4:		; CHECK-LABEL: f4:
; CHECK: lgr [[BASE:%r[1-5]]], %r3		; CHECK: lay [[BASE:%r[1-5]]], 524280(%r3)
; CHECK: agfi [[BASE]], 524288		; CHECK: alg {{%r[0-5]}}, 8([[BASE]])
		uweigandUnsubmitted Done Reply Inline Actions It would be preferable to keep verifying the base register here, i.e. lay [[BASE::%r[1-5]]], 524280(%1) alg {{%r[0-5]}}, 8([[BASE]]) uweigand: It would be preferable to keep verifying the base register here, i.e. ``` lay [[BASE::%r[1…
; CHECK: alg {{%r[0-5]}}, 0([[BASE]])
; CHECK: alcg {{%r[0-5]}}, 524280(%r3)		; CHECK: alcg {{%r[0-5]}}, 524280(%r3)
; CHECK: br %r14		; CHECK: br %r14
%addr = add i64 %base, 524280		%addr = add i64 %base, 524280
%bptr = inttoptr i64 %addr to i128 *		%bptr = inttoptr i64 %addr to i128 *
%a = load volatile i128, i128 *%aptr		%a = load volatile i128, i128 *%aptr
%b = load i128, i128 *%bptr		%b = load i128, i128 *%bptr
%add = add i128 %a, %b		%add = add i128 %a, %b
store i128 %add, i128 *%aptr		store i128 %add, i128 *%aptr
ret void		ret void
}		}

; Test the next doubleword after that, which requires separate logic for		; Test the next doubleword after that, which requires separate logic for
; both instructions. It would be better to create an anchor at 524288		; both instructions.
; that both instructions can use, but that isn't implemented yet.
define void @f5(i128 *%aptr, i64 %base) {		define void @f5(i128 *%aptr, i64 %base) {
; CHECK-LABEL: f5:		; CHECK-LABEL: f5:
; CHECK: alg {{%r[0-5]}}, 0({{%r[1-5]}})		; CHECK: alg {{%r[0-5]}}, 8({{%r[1-5]}})
; CHECK: alcg {{%r[0-5]}}, 0({{%r[1-5]}})		; CHECK: alcg {{%r[0-5]}}, 0({{%r[1-5]}})
; CHECK: br %r14		; CHECK: br %r14
%addr = add i64 %base, 524288		%addr = add i64 %base, 524288
%bptr = inttoptr i64 %addr to i128 *		%bptr = inttoptr i64 %addr to i128 *
%a = load volatile i128, i128 *%aptr		%a = load volatile i128, i128 *%aptr
%b = load i128, i128 *%bptr		%b = load i128, i128 *%bptr
%add = add i128 %a, %b		%add = add i128 %a, %b
store i128 %add, i128 *%aptr		store i128 %add, i128 *%aptr
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/int-sub-05.ll

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	; CHECK: br %r14
%sub = sub i128 %a, %b		%sub = sub i128 %a, %b
store i128 %sub, i128 *%aptr		store i128 %sub, i128 *%aptr
ret void		ret void
}		}

; Test the next doubleword up, which requires separate address logic for SLG.		; Test the next doubleword up, which requires separate address logic for SLG.
define void @f4(i64 %base) {		define void @f4(i64 %base) {
; CHECK-LABEL: f4:		; CHECK-LABEL: f4:
; CHECK: lgr [[BASE:%r[1-5]]], %r2		; CHECK: lay [[BASE:%r[1-5]]], 524280(%r2)
; CHECK: agfi [[BASE]], 524288		; CHECK: slg {{%r[0-5]}}, 8([[BASE]])
		uweigandUnsubmitted Done Reply Inline Actions Same here. uweigand: Same here.
		uweigandUnsubmitted Done Reply Inline Actions Should be %r[1-5] here as well, register 0 cannot be used for address generation. Otherwise, the SystemZ changes LGTM now. Thanks! uweigand: Should be %r[1-5] here as well, register 0 cannot be used for address generation. Otherwise…
		luismarquesAuthorUnsubmitted Done Reply Inline Actions Oops. Thanks! luismarques: Oops. Thanks!
		uweigandUnsubmitted Not Done Reply Inline Actions Perfect, thanks! uweigand: Perfect, thanks!
; CHECK: slg {{%r[0-5]}}, 0([[BASE]])
; CHECK: slbg {{%r[0-5]}}, 524280(%r2)		; CHECK: slbg {{%r[0-5]}}, 524280(%r2)
; CHECK: br %r14		; CHECK: br %r14
%addr = add i64 %base, 524280		%addr = add i64 %base, 524280
%bptr = inttoptr i64 %addr to i128 *		%bptr = inttoptr i64 %addr to i128 *
%aptr = getelementptr i128, i128 *%bptr, i64 -8		%aptr = getelementptr i128, i128 *%bptr, i64 -8
%a = load i128, i128 *%aptr		%a = load i128, i128 *%aptr
%b = load i128, i128 *%bptr		%b = load i128, i128 *%bptr
%sub = sub i128 %a, %b		%sub = sub i128 %a, %b
store i128 %sub, i128 *%aptr		store i128 %sub, i128 *%aptr
ret void		ret void
}		}

; Test the next doubleword after that, which requires separate logic for		; Test the next doubleword after that, which requires separate logic for
; both instructions. It would be better to create an anchor at 524288		; both instructions.
; that both instructions can use, but that isn't implemented yet.
define void @f5(i64 %base) {		define void @f5(i64 %base) {
; CHECK-LABEL: f5:		; CHECK-LABEL: f5:
; CHECK: slg {{%r[0-5]}}, 0({{%r[1-5]}})		; CHECK: slg {{%r[0-5]}}, 8({{%r[1-5]}})
; CHECK: slbg {{%r[0-5]}}, 0({{%r[1-5]}})		; CHECK: slbg {{%r[0-5]}}, 0({{%r[1-5]}})
; CHECK: br %r14		; CHECK: br %r14
%addr = add i64 %base, 524288		%addr = add i64 %base, 524288
%bptr = inttoptr i64 %addr to i128 *		%bptr = inttoptr i64 %addr to i128 *
%aptr = getelementptr i128, i128 *%bptr, i64 -8		%aptr = getelementptr i128, i128 *%bptr, i64 -8
%a = load i128, i128 *%aptr		%a = load i128, i128 *%aptr
%b = load i128, i128 *%bptr		%b = load i128, i128 *%bptr
%sub = sub i128 %a, %b		%sub = sub i128 %a, %b
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines