This is an archive of the discontinued LLVM Phabricator instance.

[MachineInstr] Add support for instructions with multiple memory operands.
ClosedPublic

Authored by hliao on Oct 14 2020, 10:10 PM.

Download Raw Diff

Details

Reviewers

efriedma
qcolombet
hfinkel
craig.topper
t.p.northover
uweigand
dmgreen

Commits

rG4b1120159274: [MachineInstr] Add support for instructions with multiple memory operands.

Summary

Basically iterate each pair of memory operands from both instructions and return true if any of them may alias.
The exception are memory instructions without any memory operand. They may touch everything and could alias to any memory instruction.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hliao created this revision.Oct 14 2020, 10:10 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2020, 10:10 PM

Herald added subscribers: llvm-commits, pengfei, dmgreen, hiraditya. · View Herald Transcript

hliao requested review of this revision.Oct 14 2020, 10:10 PM

That change triggers a few regression test failures. All of them are due to different code schedule due to memory instructions with multiple memory operands. Please help me double-check that's the case and whether that sounds a better code sequence. Thanks.

Harbormaster completed remote builds in B75144: Diff 298300.Oct 14 2020, 11:31 PM

This looks familiar. It is the same as https://reviews.llvm.org/D80161? That patch was apparently reverted in 65cd2c7a8015577fea15c861f41d2e4b5768961f, because it was hitting timeouts on certain targets.

I'm not sure if @Kayjukh remembers more? But it may be worth putting a limit on the total number of alias checks it can perform.

The test changes seem fine to me.

This looks like the same as my old patch (https://reviews.llvm.org/D80161). There are additional details available on the commit's Phab page. To put it in a nutshell, some codes can trigger a very large amount of calls to the aliasing check (see the repro provided by @nemanjai: https://pastebin.com/tRtTQdSa), which results in a very large increase in compilation time.

Bounding the number of checks may be a good solution, even though it would be nicer to have something more clever that could allow all the operands to be checked. Not sure how feasible this would be though.

In D89447#2331763, @dmgreen wrote:

This looks familiar. It is the same as https://reviews.llvm.org/D80161? That patch was apparently reverted in 65cd2c7a8015577fea15c861f41d2e4b5768961f, because it was hitting timeouts on certain targets.

I'm not sure if @Kayjukh remembers more? But it may be worth putting a limit on the total number of alias checks it can perform.

The test changes seem fine to me.

Sorry, don't search for the relevant change. Yes, it's exactly the same purpose with minor differences on check conditions.

In D89447#2331842, @Kayjukh wrote:

This looks like the same as my old patch (https://reviews.llvm.org/D80161). There are additional details available on the commit's Phab page. To put it in a nutshell, some codes can trigger a very large amount of calls to the aliasing check (see the repro provided by @nemanjai: https://pastebin.com/tRtTQdSa), which results in a very large increase in compilation time.

Bounding the number of checks may be a good solution, even though it would be nicer to have something more clever that could allow all the operands to be checked. Not sure how feasible this would be though.

Thanks. I will check that.

In D89447#2331842, @Kayjukh wrote:

This looks like the same as my old patch (https://reviews.llvm.org/D80161). There are additional details available on the commit's Phab page. To put it in a nutshell, some codes can trigger a very large amount of calls to the aliasing check (see the repro provided by @nemanjai: https://pastebin.com/tRtTQdSa), which results in a very large increase in compilation time.

Bounding the number of checks may be a good solution, even though it would be nicer to have something more clever that could allow all the operands to be checked. Not sure how feasible this would be though.

Yeah, I could reproduce that compile-time issue. That issue happens in the post-RA scheduler. I didn't find that as the generic x86 backend disables the post-RA scheduler.
Back to the compilation issue itself, as a short-term solution, putting a limit seems reasonable to me as most native machine instructions probably only take 2~3 memory operands. We may also add a target hook to get that more reasonable for the underlying target. If any instruction has more memory operands that that limit, we just take them conservatively. In long run, adding alias result caching in basic-aa would reduce time but increase space required.

A relevant question is that do we need to run full AA in the post-RA scheduler if the pre-RA scheduler is already enabled. Except for the newly generated memory operations (spill, reload, rematerialized loads, and etc.), we only got registers allocated and assigned in RA. The memory dependency after the pre-RA scheduler still holds. Running AA check on these memory instructions only reproduce the same result. For register spills/reloads, they won't alias to these existing memory instructions but only alias to their counterparts. They also don't have too much space to be scheduled around due to the register dependency. By pruning these unnecessary memory dependency setup, we could save lots of compilation time contributed from the post-RA scheduler.

@jonpa can you check whether the SystemZ test case you added still checks what it was intended to check here?

Add limit on memory operand AA check.

In D89447#2333315, @hliao wrote:

Add limit on memory operand AA check.

With that change, there's almost no measurable difference in compilation time on that reproducer.

lkail added a subscriber: lkail.Oct 15 2020, 3:06 PM

Harbormaster completed remote builds in B75229: Diff 298475.Oct 15 2020, 3:39 PM

In D89447#2332645, @uweigand wrote:

@jonpa can you check whether the SystemZ test case you added still checks what it was intended to check here?

I actually see a problem with this patch: The SystemZ test case passes the original test by not producing any scalar load + vector element insertions instructions. However now I see that all PFD (prefetch) instructions end up last in the block, whereas before they were not. This may or not be good if the block is huge and the prefetching is therefore not done too many iterations ahead. Maybe this should be checked with benchmarks to make sure the current prefetching tuning does not loose by this.

The reason seems to be that the SystemZ::PFD instruction is marked both with mayLoad and mayStore, but now this patch looks at the *memory operand* and figures out that it is only loading and therefore does not alias with another load. This check was previously only done with the MI flags. The post-RA scheduler now puts all of them at the end.

Remove the MMO non-store check.

In D89447#2334424, @jonpa wrote:

In D89447#2332645, @uweigand wrote:

@jonpa can you check whether the SystemZ test case you added still checks what it was intended to check here?

I actually see a problem with this patch: The SystemZ test case passes the original test by not producing any scalar load + vector element insertions instructions. However now I see that all PFD (prefetch) instructions end up last in the block, whereas before they were not. This may or not be good if the block is huge and the prefetching is therefore not done too many iterations ahead. Maybe this should be checked with benchmarks to make sure the current prefetching tuning does not loose by this.

The reason seems to be that the SystemZ::PFD instruction is marked both with mayLoad and mayStore, but now this patch looks at the *memory operand* and figures out that it is only loading and therefore does not alias with another load. This check was previously only done with the MI flags. The post-RA scheduler now puts all of them at the end.

Even if I removed the following check, the result is still the same

The real reason is that prefetch is iterations ahead and won't alias to any memory access in that tloop body. Previously, we

In D89447#2336415, @hliao wrote:

Remove the MMO non-store check.

In D89447#2334424, @jonpa wrote:

In D89447#2332645, @uweigand wrote:

@jonpa can you check whether the SystemZ test case you added still checks what it was intended to check here?

I actually see a problem with this patch: The SystemZ test case passes the original test by not producing any scalar load + vector element insertions instructions. However now I see that all PFD (prefetch) instructions end up last in the block, whereas before they were not. This may or not be good if the block is huge and the prefetching is therefore not done too many iterations ahead. Maybe this should be checked with benchmarks to make sure the current prefetching tuning does not loose by this.

The reason seems to be that the SystemZ::PFD instruction is marked both with mayLoad and mayStore, but now this patch looks at the *memory operand* and figures out that it is only loading and therefore does not alias with another load. This check was previously only done with the MI flags. The post-RA scheduler now puts all of them at the end.

I removed that check that one of those two MMOs needs to be store. No change to test now. But, that sounds a little weird that one instruction claims mayLoad and mayStore but only has isLoad MMO. Maybe we need to change that MMO in prefetch to be both isLoad and isStore.

Harbormaster completed remote builds in B75402: Diff 298792.Oct 16 2020, 7:09 PM

I removed that check that one of those two MMOs needs to be store. No change to test now. But, that sounds a little weird that one instruction claims mayLoad and mayStore but only has isLoad MMO. Maybe we need to change that MMO in prefetch to be both isLoad and isStore.

The SystemZ Prefetch Data (PFD) instruction has a flag bit to control if the prefetch is read or write, so in a way those two flags make sense. However, since the PFD does not in fact clobber any memory I think the idea behind the mayLoad and mayStore flags is to keep the instruction in place in the block as much as possible. It shouldn't necessarily matter, but generally it is probably better spread out the memory accesses / prefetches rather than having them all happen at once, I would think. If the block is big with many prefetches they shouldn't all end up in the end of the block...

If you think that check is valuable, then maybe we could try to find another way to keep the PFD instructions in their places. I tried adding hasSideEffects = 1 on PFD/PFDRL instructions instead of removing the check on the MMOs, but that did not seem to be NFC on benchmarks...

In D89447#2336699, @jonpa wrote:

I removed that check that one of those two MMOs needs to be store. No change to test now. But, that sounds a little weird that one instruction claims mayLoad and mayStore but only has isLoad MMO. Maybe we need to change that MMO in prefetch to be both isLoad and isStore.

The SystemZ Prefetch Data (PFD) instruction has a flag bit to control if the prefetch is read or write, so in a way those two flags make sense. However, since the PFD does not in fact clobber any memory I think the idea behind the mayLoad and mayStore flags is to keep the instruction in place in the block as much as possible. It shouldn't necessarily matter, but generally it is probably better spread out the memory accesses / prefetches rather than having them all happen at once, I would think. If the block is big with many prefetches they shouldn't all end up in the end of the block...

If you think that check is valuable, then maybe we could try to find another way to keep the PFD instructions in their places. I tried adding hasSideEffects = 1 on PFD/PFDRL instructions instead of removing the check on the MMOs, but that did not seem to be NFC on benchmarks...

Not only target-specific PREFETCH has that, the target-independent PREFETCH has both mayLoad and mayStore. I thought that maybe added in the early day to hour the program order. Even if a prefetch may prefetch for write or precisely for the ownership, it really didn't change that memory at all. The order of PREFETCH needs to be honored by the scheduler is due to it don't understand the target timing yet.

PING for review.

Just kingly PING for review.

hliao added a reviewer: dmgreen.Oct 22 2020, 1:11 PM

PING for review. As a similar patch was approved, shall I just commit it again with the compilation time issue is addressed?

Rebase

Harbormaster completed remote builds in B76206: Diff 300304.Oct 23 2020, 9:17 AM

Has the SystremZ Prefetch issues been resolved?

llvm/include/llvm/CodeGen/TargetInstrInfo.h
1743	So the limit on the number of alias checks is effectively 16? Would it be worth making it so limit is on the total number of checks, not the MemOperands per instruction? That way an instruction with a single operand could be compared to an instruction with many (which I imagine would be a common case).
llvm/lib/CodeGen/MachineInstr.cpp
1341	Why is this new ordered check needed? I didn't think that atomics had multiple memory operands.

In D89447#2351887, @dmgreen wrote:

Has the SystremZ Prefetch issues been resolved?

Yes, no sytemz tests are affected by this change now.

Rebase

Harbormaster completed remote builds in B76309: Diff 300520.Oct 24 2020, 8:42 PM

Remove unordered check.

hliao marked an inline comment as done.Oct 26 2020, 7:14 AM

hliao added inline comments.

llvm/include/llvm/CodeGen/TargetInstrInfo.h
1743	Just IMHO, the limit here is more straightforward to figure out. The backend needs to consider the corner case where two instructions with more than one mem operands are checked.
llvm/lib/CodeGen/MachineInstr.cpp
1341	I added that just as future proof just like the load/store check. Just remove them to keep this change as minimal as possible.

Harbormaster completed remote builds in B76396: Diff 300663.Oct 26 2020, 7:49 AM

Kayjukh added inline comments.Oct 26 2020, 11:20 AM

llvm/include/llvm/CodeGen/TargetInstrInfo.h
1743	I think it would make sense to still limit the number of checks instead of the number of memory operands. If, say, you put a limit of 16 checks---which would be equivalent to the 4 memory operands you set there----then you can compare both instructions with 4 memory operands each but also an instruction with a single memory operand with another one that happens to have 8. The bound on the cost would be the same but it would make the check more flexible.

hliao marked an inline comment as done.Oct 26 2020, 3:01 PM

hliao added inline comments.

llvm/include/llvm/CodeGen/TargetInstrInfo.h
1743	if a target processor has a native instruction with up to 8 mem operands, won't the developer set this limit to 8 instead of 4 to support all combinations of possible native instructions? My point is that should such a limit be mainly decided by the target processor or possible memory operands due to post-RA optimizations, which one is more reasonable?

PING for review

Kindly PING again

dmgreen added inline comments.Nov 2 2020, 12:24 AM

llvm/include/llvm/CodeGen/TargetInstrInfo.h
1743	I would make it a total limit. Consider the arm backend where we can have 16 MMO's in a single ldm instruction (sort of). You do not want to put the limit here to 16 per instruction, as that would put the total limit to 256! Far too high! Even a limit of 8 per instruction would sound very high. But if the limit was a total of 16 then at least the ldm with 16 operand could be compared against an instruction with 1. And a ldm with 8 operands could be compared against a ldrd with 2.

Change how that limit is set.

hliao marked an inline comment as done.Nov 3 2020, 7:08 AM

hliao added inline comments.

llvm/include/llvm/CodeGen/TargetInstrInfo.h
1743	I see your point. That's reasonable.

Harbormaster completed remote builds in B77408: Diff 302576.Nov 3 2020, 8:25 AM

Thanks. (We get some codesize increases from this in my quick AArch64 testing. The idea should in general be an improvement though and they are not very large).

Please keep on eye on compile time, but other than that LGTM.

This revision is now accepted and ready to land.Nov 3 2020, 9:34 AM

This revision was landed with ongoing or failed builds.Nov 3 2020, 5:49 PM

Closed by commit rG4b1120159274: [MachineInstr] Add support for instructions with multiple memory operands. (authored by hliao). · Explain Why

This revision was automatically updated to reflect the committed changes.

hliao marked an inline comment as done.

hliao added a commit: rG4b1120159274: [MachineInstr] Add support for instructions with multiple memory operands..

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetInstrInfo.h

15 lines

lib/

CodeGen/

MachineInstr.cpp

145 lines

test/

CodeGen/

AArch64/

merge-store-dependency.ll

4 lines

ARM/

big-endian-neon-fp16-bitconv.ll

4 lines

Thumb2/

mve-float32regloops.ll

8 lines

mve-phireg.ll

10 lines

mve-vst3.ll

2 lines

umulo-128-legalisation-lowering.ll

8 lines

X86/

store_op_load_fold2.ll

11 lines

Diff 302721

llvm/include/llvm/CodeGen/TargetInstrInfo.h

Show First 20 Lines • Show All 1,731 Lines • ▼ Show 20 Lines	public:
/// Return the value to use for the MachineCSE's LookAheadLimit,		/// Return the value to use for the MachineCSE's LookAheadLimit,
/// which is a heuristic used for CSE'ing phys reg defs.		/// which is a heuristic used for CSE'ing phys reg defs.
virtual unsigned getMachineCSELookAheadLimit() const {		virtual unsigned getMachineCSELookAheadLimit() const {
// The default lookahead is small to prevent unprofitable quadratic		// The default lookahead is small to prevent unprofitable quadratic
// behavior.		// behavior.
return 5;		return 5;
}		}

		/// Return the maximal number of alias checks on memory operands. For
		/// instructions with more than one memory operands, the alias check on a
		/// single MachineInstr pair has quadratic overhead and results in
		/// unacceptable performance in the worst case. The limit here is to clamp
		dmgreenUnsubmitted Not Done Reply Inline Actions So the limit on the number of alias checks is effectively 16? Would it be worth making it so limit is on the total number of checks, not the MemOperands per instruction? That way an instruction with a single operand could be compared to an instruction with many (which I imagine would be a common case). dmgreen: So the limit on the number of alias checks is effectively 16? Would it be worth making it so…
		hliaoAuthorUnsubmitted Done Reply Inline Actions Just IMHO, the limit here is more straightforward to figure out. The backend needs to consider the corner case where two instructions with more than one mem operands are checked. hliao: Just IMHO, the limit here is more straightforward to figure out. The backend needs to consider…
		KayjukhUnsubmitted Not Done Reply Inline Actions I think it would make sense to still limit the number of checks instead of the number of memory operands. If, say, you put a limit of 16 checks---which would be equivalent to the 4 memory operands you set there----then you can compare both instructions with 4 memory operands each but also an instruction with a single memory operand with another one that happens to have 8. The bound on the cost would be the same but it would make the check more flexible. Kayjukh: I think it would make sense to still limit the number of checks instead of the number of memory…
		hliaoAuthorUnsubmitted Done Reply Inline Actions if a target processor has a native instruction with up to 8 mem operands, won't the developer set this limit to 8 instead of 4 to support all combinations of possible native instructions? My point is that should such a limit be mainly decided by the target processor or possible memory operands due to post-RA optimizations, which one is more reasonable? hliao: if a target processor has a native instruction with up to 8 mem operands, won't the developer…
		dmgreenUnsubmitted Done Reply Inline Actions I would make it a total limit. Consider the arm backend where we can have 16 MMO's in a single ldm instruction (sort of). You do not want to put the limit here to 16 per instruction, as that would put the total limit to 256! Far too high! Even a limit of 8 per instruction would sound very high. But if the limit was a total of 16 then at least the ldm with 16 operand could be compared against an instruction with 1. And a ldm with 8 operands could be compared against a ldrd with 2. dmgreen: I would make it a total limit. Consider the arm backend where we can have 16 MMO's in a single…
		hliaoAuthorUnsubmitted Done Reply Inline Actions I see your point. That's reasonable. hliao: I see your point. That's reasonable.
		/// that maximal checks performed. Usually, that's the product of memory
		/// operand numbers from that pair of MachineInstr to be checked. For
		/// instance, with two MachineInstrs with 4 and 5 memory operands
		/// correspondingly, a total of 20 checks are required. With this limit set to
		/// 16, their alias check is skipped. We choose to limit the product instead
		/// of the individual instruction as targets may have special MachineInstrs
		/// with a considerably high number of memory operands, such as `ldm` in ARM.
		/// Setting this limit per MachineInstr would result in either too high
		/// overhead or too rigid restriction.
		virtual unsigned getMemOperandAACheckLimit() const { return 16; }

/// Return an array that contains the ids of the target indices (used for the		/// Return an array that contains the ids of the target indices (used for the
/// TargetIndex machine operand) and their names.		/// TargetIndex machine operand) and their names.
///		///
/// MIR Serialization is able to serialize only the target indices that are		/// MIR Serialization is able to serialize only the target indices that are
/// defined by this method.		/// defined by this method.
virtual ArrayRef<std::pair<int, const char *>>		virtual ArrayRef<std::pair<int, const char *>>
getSerializableTargetIndices() const {		getSerializableTargetIndices() const {
return None;		return None;
▲ Show 20 Lines • Show All 182 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineInstr.cpp

Show First 20 Lines • Show All 1,270 Lines • ▼ Show 20 Lines	bool MachineInstr::mayAlias(AAResults *AA, const MachineInstr &Other,
// Both instructions must be memory operations to be able to alias.		// Both instructions must be memory operations to be able to alias.
if (!mayLoadOrStore() \|\| !Other.mayLoadOrStore())		if (!mayLoadOrStore() \|\| !Other.mayLoadOrStore())
return false;		return false;

// Let the target decide if memory accesses cannot possibly overlap.		// Let the target decide if memory accesses cannot possibly overlap.
if (TII->areMemAccessesTriviallyDisjoint(*this, Other))		if (TII->areMemAccessesTriviallyDisjoint(*this, Other))
return false;		return false;

// FIXME: Need to handle multiple memory operands to support all targets.		// Memory operations without memory operands may access anything. Be
if (!hasOneMemOperand() \|\| !Other.hasOneMemOperand())		// conservative and assume `MayAlias`.
		if (memoperands_empty() \|\| Other.memoperands_empty())
return true;		return true;

MachineMemOperand MMOa = memoperands_begin();		// Skip if there are too many memory operands.
MachineMemOperand MMOb = Other.memoperands_begin();		auto NumChecks = getNumMemOperands() * Other.getNumMemOperands();
		if (NumChecks > TII->getMemOperandAACheckLimit())
		return true;

		auto HasAlias = [MFI, AA, UseTBAA](const MachineMemOperand *MMOa,
		const MachineMemOperand *MMOb) {
// The following interface to AA is fashioned after DAGCombiner::isAlias		// The following interface to AA is fashioned after DAGCombiner::isAlias
// and operates with MachineMemOperand offset with some important		// and operates with MachineMemOperand offset with some important
// assumptions:		// assumptions:
// - LLVM fundamentally assumes flat address spaces.		// - LLVM fundamentally assumes flat address spaces.
// - MachineOperand offset can only result from legalization and		// - MachineOperand offset can only result from legalization and
// cannot affect queries other than the trivial case of overlap		// cannot affect queries other than the trivial case of overlap
// checking.		// checking.
// - These offsets never wrap and never step outside		// - These offsets never wrap and never step outside
// of allocated objects.		// of allocated objects.
// - There should never be any negative offsets here.		// - There should never be any negative offsets here.
//		//
// FIXME: Modify API to hide this math from "user"		// FIXME: Modify API to hide this math from "user"
// Even before we go to AA we can reason locally about some		// Even before we go to AA we can reason locally about some
// memory objects. It can save compile time, and possibly catch some		// memory objects. It can save compile time, and possibly catch some
// corner cases not currently covered.		// corner cases not currently covered.

int64_t OffsetA = MMOa->getOffset();		int64_t OffsetA = MMOa->getOffset();
int64_t OffsetB = MMOb->getOffset();		int64_t OffsetB = MMOb->getOffset();
int64_t MinOffset = std::min(OffsetA, OffsetB);		int64_t MinOffset = std::min(OffsetA, OffsetB);

uint64_t WidthA = MMOa->getSize();		uint64_t WidthA = MMOa->getSize();
uint64_t WidthB = MMOb->getSize();		uint64_t WidthB = MMOb->getSize();
bool KnownWidthA = WidthA != MemoryLocation::UnknownSize;		bool KnownWidthA = WidthA != MemoryLocation::UnknownSize;
bool KnownWidthB = WidthB != MemoryLocation::UnknownSize;		bool KnownWidthB = WidthB != MemoryLocation::UnknownSize;

const Value *ValA = MMOa->getValue();		const Value *ValA = MMOa->getValue();
const Value *ValB = MMOb->getValue();		const Value *ValB = MMOb->getValue();
bool SameVal = (ValA && ValB && (ValA == ValB));		bool SameVal = (ValA && ValB && (ValA == ValB));
if (!SameVal) {		if (!SameVal) {
const PseudoSourceValue *PSVa = MMOa->getPseudoValue();		const PseudoSourceValue *PSVa = MMOa->getPseudoValue();
const PseudoSourceValue *PSVb = MMOb->getPseudoValue();		const PseudoSourceValue *PSVb = MMOb->getPseudoValue();
if (PSVa && ValB && !PSVa->mayAlias(&MFI))		if (PSVa && ValB && !PSVa->mayAlias(&MFI))
return false;		return false;
if (PSVb && ValA && !PSVb->mayAlias(&MFI))		if (PSVb && ValA && !PSVb->mayAlias(&MFI))
return false;		return false;
if (PSVa && PSVb && (PSVa == PSVb))		if (PSVa && PSVb && (PSVa == PSVb))
SameVal = true;		SameVal = true;
}		}

if (SameVal) {		if (SameVal) {
if (!KnownWidthA \|\| !KnownWidthB)		if (!KnownWidthA \|\| !KnownWidthB)
return true;		return true;
int64_t MaxOffset = std::max(OffsetA, OffsetB);		int64_t MaxOffset = std::max(OffsetA, OffsetB);
int64_t LowWidth = (MinOffset == OffsetA) ? WidthA : WidthB;		int64_t LowWidth = (MinOffset == OffsetA) ? WidthA : WidthB;
return (MinOffset + LowWidth > MaxOffset);		return (MinOffset + LowWidth > MaxOffset);
}		}

if (!AA)		if (!AA)
return true;		return true;

if (!ValA \|\| !ValB)		if (!ValA \|\| !ValB)
		dmgreenUnsubmitted Done Reply Inline Actions Why is this new ordered check needed? I didn't think that atomics had multiple memory operands. dmgreen: Why is this new ordered check needed? I didn't think that atomics had multiple memory operands.
		hliaoAuthorUnsubmitted Done Reply Inline Actions I added that just as future proof just like the load/store check. Just remove them to keep this change as minimal as possible. hliao: I added that just as future proof just like the load/store check. Just remove them to keep this…
return true;		return true;

assert((OffsetA >= 0) && "Negative MachineMemOperand offset");		assert((OffsetA >= 0) && "Negative MachineMemOperand offset");
assert((OffsetB >= 0) && "Negative MachineMemOperand offset");		assert((OffsetB >= 0) && "Negative MachineMemOperand offset");

int64_t OverlapA = KnownWidthA ? WidthA + OffsetA - MinOffset		int64_t OverlapA = KnownWidthA ? WidthA + OffsetA - MinOffset
: MemoryLocation::UnknownSize;		: MemoryLocation::UnknownSize;
int64_t OverlapB = KnownWidthB ? WidthB + OffsetB - MinOffset		int64_t OverlapB = KnownWidthB ? WidthB + OffsetB - MinOffset
: MemoryLocation::UnknownSize;		: MemoryLocation::UnknownSize;

AliasResult AAResult = AA->alias(		AliasResult AAResult =
MemoryLocation(ValA, OverlapA,		AA->alias(MemoryLocation(ValA, OverlapA,
UseTBAA ? MMOa->getAAInfo() : AAMDNodes()),		UseTBAA ? MMOa->getAAInfo() : AAMDNodes()),
MemoryLocation(ValB, OverlapB,		MemoryLocation(ValB, OverlapB,
UseTBAA ? MMOb->getAAInfo() : AAMDNodes()));		UseTBAA ? MMOb->getAAInfo() : AAMDNodes()));

return (AAResult != NoAlias);		return (AAResult != NoAlias);
		};

		// Check each pair of memory operands from both instructions, which can't
		// alias only if all pairs won't alias.
		for (auto *MMOa : memoperands())
		for (auto *MMOb : Other.memoperands())
		if (HasAlias(MMOa, MMOb))
		return true;

		return false;
}		}

/// hasOrderedMemoryRef - Return true if this instruction may have an ordered		/// hasOrderedMemoryRef - Return true if this instruction may have an ordered
/// or volatile memory reference, or if the information describing the memory		/// or volatile memory reference, or if the information describing the memory
/// reference is not available. Return false if it is known to have no ordered		/// reference is not available. Return false if it is known to have no ordered
/// memory references.		/// memory references.
bool MachineInstr::hasOrderedMemoryRef() const {		bool MachineInstr::hasOrderedMemoryRef() const {
// An instruction known never to access memory won't have a volatile access.		// An instruction known never to access memory won't have a volatile access.
▲ Show 20 Lines • Show All 915 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/merge-store-dependency.ll

	Show All 13 Lines
	; A53-NEXT: .cfi_def_cfa_offset 16			; A53-NEXT: .cfi_def_cfa_offset 16
	; A53-NEXT: .cfi_offset w19, -8			; A53-NEXT: .cfi_offset w19, -8
	; A53-NEXT: .cfi_offset w30, -16			; A53-NEXT: .cfi_offset w30, -16
	; A53-NEXT: movi v0.2d, #0000000000000000			; A53-NEXT: movi v0.2d, #0000000000000000
	; A53-NEXT: mov x8, x0			; A53-NEXT: mov x8, x0
	; A53-NEXT: mov x19, x8			; A53-NEXT: mov x19, x8
	; A53-NEXT: mov w0, w1			; A53-NEXT: mov w0, w1
	; A53-NEXT: mov w9, #256			; A53-NEXT: mov w9, #256
				; A53-NEXT: stp x2, x3, [x8, #32]
				; A53-NEXT: mov x2, x8
	; A53-NEXT: str q0, [x19, #16]!			; A53-NEXT: str q0, [x19, #16]!
	; A53-NEXT: str w1, [x19]			; A53-NEXT: str w1, [x19]
	; A53-NEXT: mov w1, #4			; A53-NEXT: mov w1, #4
	; A53-NEXT: stp x2, x3, [x8, #32]
	; A53-NEXT: mov x2, x8
	; A53-NEXT: str q0, [x8]			; A53-NEXT: str q0, [x8]
	; A53-NEXT: strh w9, [x8, #24]			; A53-NEXT: strh w9, [x8, #24]
	; A53-NEXT: str wzr, [x8, #20]			; A53-NEXT: str wzr, [x8, #20]
	; A53-NEXT: bl fcntl			; A53-NEXT: bl fcntl
	; A53-NEXT: adrp x9, gv0			; A53-NEXT: adrp x9, gv0
	; A53-NEXT: add x9, x9, :lo12:gv0			; A53-NEXT: add x9, x9, :lo12:gv0
	; A53-NEXT: cmp x19, x9			; A53-NEXT: cmp x19, x9
	; A53-NEXT: b.eq .LBB0_4			; A53-NEXT: b.eq .LBB0_4
	▲ Show 20 Lines • Show All 176 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/big-endian-neon-fp16-bitconv.ll

	Show First 20 Lines • Show All 497 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vrev64.16 q8, q8			; CHECK-NEXT: vrev64.16 q8, q8
	; CHECK-NEXT: vadd.f16 q8, q9, q8			; CHECK-NEXT: vadd.f16 q8, q9, q8
	; CHECK-NEXT: vrev32.16 q8, q8			; CHECK-NEXT: vrev32.16 q8, q8
	; CHECK-NEXT: vmov.32 r12, d17[1]			; CHECK-NEXT: vmov.32 r12, d17[1]
	; CHECK-NEXT: vmov.32 r2, d17[0]			; CHECK-NEXT: vmov.32 r2, d17[0]
	; CHECK-NEXT: vmov.32 r3, d16[1]			; CHECK-NEXT: vmov.32 r3, d16[1]
	; CHECK-NEXT: vmov.32 r1, d16[0]			; CHECK-NEXT: vmov.32 r1, d16[0]
	; CHECK-NEXT: subs r12, r12, #1			; CHECK-NEXT: subs r12, r12, #1
				; CHECK-NEXT: str r12, [r0, #12]
	; CHECK-NEXT: sbcs r2, r2, #0			; CHECK-NEXT: sbcs r2, r2, #0
				; CHECK-NEXT: str r2, [r0, #8]
	; CHECK-NEXT: sbcs r3, r3, #0			; CHECK-NEXT: sbcs r3, r3, #0
	; CHECK-NEXT: sbc r1, r1, #0			; CHECK-NEXT: sbc r1, r1, #0
	; CHECK-NEXT: stm r0, {r1, r3}			; CHECK-NEXT: stm r0, {r1, r3}
	; CHECK-NEXT: str r2, [r0, #8]
	; CHECK-NEXT: str r12, [r0, #12]
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	; CHECK-NEXT: .p2align 4			; CHECK-NEXT: .p2align 4
	; CHECK-NEXT: @ %bb.1:			; CHECK-NEXT: @ %bb.1:
	; CHECK-NEXT: .LCPI18_0:			; CHECK-NEXT: .LCPI18_0:
	; CHECK-NEXT: .short 0xbc00 @ half -1			; CHECK-NEXT: .short 0xbc00 @ half -1
	; CHECK-NEXT: .short 0x3c00 @ half 1			; CHECK-NEXT: .short 0x3c00 @ half 1
	; CHECK-NEXT: .short 0xbc00 @ half -1			; CHECK-NEXT: .short 0xbc00 @ half -1
	; CHECK-NEXT: .short 0x3c00 @ half 1			; CHECK-NEXT: .short 0x3c00 @ half 1
	▲ Show 20 Lines • Show All 201 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-float32regloops.ll

	Show First 20 Lines • Show All 1,088 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: @ Child Loop BB16_10 Depth 2			; CHECK-NEXT: @ Child Loop BB16_10 Depth 2
	; CHECK-NEXT: vldrw.u32 q0, [r1], #16			; CHECK-NEXT: vldrw.u32 q0, [r1], #16
	; CHECK-NEXT: ldrd r7, r4, [r12]			; CHECK-NEXT: ldrd r7, r4, [r12]
	; CHECK-NEXT: ldrd r0, r6, [r12, #8]			; CHECK-NEXT: ldrd r0, r6, [r12, #8]
	; CHECK-NEXT: ldrd r3, lr, [r12, #16]			; CHECK-NEXT: ldrd r3, lr, [r12, #16]
	; CHECK-NEXT: ldrd r11, r8, [r12, #24]			; CHECK-NEXT: ldrd r11, r8, [r12, #24]
	; CHECK-NEXT: vstrb.8 q0, [r9], #16			; CHECK-NEXT: vstrb.8 q0, [r9], #16
	; CHECK-NEXT: vldrw.u32 q0, [r5], #32			; CHECK-NEXT: vldrw.u32 q0, [r5], #32
				; CHECK-NEXT: strd r9, r1, [sp, #24] @ 8-byte Folded Spill
	; CHECK-NEXT: vldrw.u32 q1, [r5, #-28]			; CHECK-NEXT: vldrw.u32 q1, [r5, #-28]
	; CHECK-NEXT: vmul.f32 q0, q0, r7			; CHECK-NEXT: vmul.f32 q0, q0, r7
	; CHECK-NEXT: vldrw.u32 q6, [r5, #-24]			; CHECK-NEXT: vldrw.u32 q6, [r5, #-24]
	; CHECK-NEXT: vldrw.u32 q4, [r5, #-20]			; CHECK-NEXT: vldrw.u32 q4, [r5, #-20]
	; CHECK-NEXT: vfma.f32 q0, q1, r4			; CHECK-NEXT: vfma.f32 q0, q1, r4
	; CHECK-NEXT: vldrw.u32 q5, [r5, #-16]			; CHECK-NEXT: vldrw.u32 q5, [r5, #-16]
	; CHECK-NEXT: vfma.f32 q0, q6, r0			; CHECK-NEXT: vfma.f32 q0, q6, r0
	; CHECK-NEXT: vldrw.u32 q2, [r5, #-12]			; CHECK-NEXT: vldrw.u32 q2, [r5, #-12]
	; CHECK-NEXT: vfma.f32 q0, q4, r6			; CHECK-NEXT: vfma.f32 q0, q4, r6
	; CHECK-NEXT: vldrw.u32 q3, [r5, #-8]			; CHECK-NEXT: vldrw.u32 q3, [r5, #-8]
	; CHECK-NEXT: vfma.f32 q0, q5, r3			; CHECK-NEXT: vfma.f32 q0, q5, r3
	; CHECK-NEXT: vldrw.u32 q1, [r5, #-4]
	; CHECK-NEXT: vfma.f32 q0, q2, lr
	; CHECK-NEXT: ldr r0, [sp, #20] @ 4-byte Reload			; CHECK-NEXT: ldr r0, [sp, #20] @ 4-byte Reload
				; CHECK-NEXT: vfma.f32 q0, q2, lr
				; CHECK-NEXT: vldrw.u32 q1, [r5, #-4]
	; CHECK-NEXT: vfma.f32 q0, q3, r11			; CHECK-NEXT: vfma.f32 q0, q3, r11
	; CHECK-NEXT: strd r9, r1, [sp, #24] @ 8-byte Folded Spill
	; CHECK-NEXT: vfma.f32 q0, q1, r8
	; CHECK-NEXT: cmp r0, #16			; CHECK-NEXT: cmp r0, #16
				; CHECK-NEXT: vfma.f32 q0, q1, r8
	; CHECK-NEXT: blo .LBB16_7			; CHECK-NEXT: blo .LBB16_7
	; CHECK-NEXT: @ %bb.5: @ %for.body.preheader			; CHECK-NEXT: @ %bb.5: @ %for.body.preheader
	; CHECK-NEXT: @ in Loop: Header=BB16_4 Depth=1			; CHECK-NEXT: @ in Loop: Header=BB16_4 Depth=1
	; CHECK-NEXT: ldr.w lr, [sp, #4] @ 4-byte Reload			; CHECK-NEXT: ldr.w lr, [sp, #4] @ 4-byte Reload
	; CHECK-NEXT: dls lr, lr			; CHECK-NEXT: dls lr, lr
	; CHECK-NEXT: ldr r7, [sp, #8] @ 4-byte Reload			; CHECK-NEXT: ldr r7, [sp, #8] @ 4-byte Reload
	; CHECK-NEXT: .LBB16_6: @ %for.body			; CHECK-NEXT: .LBB16_6: @ %for.body
	; CHECK-NEXT: @ Parent Loop BB16_4 Depth=1			; CHECK-NEXT: @ Parent Loop BB16_4 Depth=1
	▲ Show 20 Lines • Show All 914 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-phireg.ll

	Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vmov r6, s3			; CHECK-NEXT: vmov r6, s3
	; CHECK-NEXT: ldr r0, [r7, #4]!			; CHECK-NEXT: ldr r0, [r7, #4]!
	; CHECK-NEXT: movw r4, :lower16:e			; CHECK-NEXT: movw r4, :lower16:e
	; CHECK-NEXT: vmov.32 q4[0], r5			; CHECK-NEXT: vmov.32 q4[0], r5
	; CHECK-NEXT: movt r4, :upper16:e			; CHECK-NEXT: movt r4, :upper16:e
	; CHECK-NEXT: vmov q1, q4			; CHECK-NEXT: vmov q1, q4
	; CHECK-NEXT: vmov s1, r7			; CHECK-NEXT: vmov s1, r7
	; CHECK-NEXT: vmov.32 q1[1], r6			; CHECK-NEXT: vmov.32 q1[1], r6
	; CHECK-NEXT: mov.w r10, #0
	; CHECK-NEXT: vmov.32 q1[2], r5
	; CHECK-NEXT: vmov.32 q5[0], r7			; CHECK-NEXT: vmov.32 q5[0], r7
				; CHECK-NEXT: vmov.32 q1[2], r5
				; CHECK-NEXT: vmov s9, r4
	; CHECK-NEXT: vmov.32 q1[3], r4			; CHECK-NEXT: vmov.32 q1[3], r4
	; CHECK-NEXT: strd r0, r10, [sp, #24]			; CHECK-NEXT: vdup.32 q6, r7
	; CHECK-NEXT: vstrw.32 q1, [sp, #76]			; CHECK-NEXT: vstrw.32 q1, [sp, #76]
	; CHECK-NEXT: vmov q1, q5			; CHECK-NEXT: vmov q1, q5
	; CHECK-NEXT: vmov s9, r4
	; CHECK-NEXT: vmov.32 q1[1], r7			; CHECK-NEXT: vmov.32 q1[1], r7
	; CHECK-NEXT: vdup.32 q6, r7
	; CHECK-NEXT: vmov.f32 s2, s1			; CHECK-NEXT: vmov.f32 s2, s1
	; CHECK-NEXT: vmov.f32 s8, s0			; CHECK-NEXT: vmov.f32 s8, s0
	; CHECK-NEXT: vmov.32 q1[2], r6			; CHECK-NEXT: vmov.32 q1[2], r6
	; CHECK-NEXT: vmov q3, q6			; CHECK-NEXT: vmov q3, q6
	; CHECK-NEXT: vmov q7, q6			; CHECK-NEXT: vmov q7, q6
	; CHECK-NEXT: vmov.f32 s10, s1			; CHECK-NEXT: vmov.f32 s10, s1
	; CHECK-NEXT: mov.w r8, #4			; CHECK-NEXT: mov.w r8, #4
				; CHECK-NEXT: mov.w r10, #0
	; CHECK-NEXT: vmov.32 q1[3], r4			; CHECK-NEXT: vmov.32 q1[3], r4
	; CHECK-NEXT: vmov.32 q3[0], r4			; CHECK-NEXT: vmov.32 q3[0], r4
	; CHECK-NEXT: vmov.32 q7[1], r4			; CHECK-NEXT: vmov.32 q7[1], r4
	; CHECK-NEXT: str r1, [r0]			; CHECK-NEXT: str r1, [r0]
	; CHECK-NEXT: vmov.f32 s11, s3			; CHECK-NEXT: vmov.f32 s11, s3
	; CHECK-NEXT: movs r1, #64			; CHECK-NEXT: movs r1, #64
	; CHECK-NEXT: strh.w r8, [sp, #390]			; CHECK-NEXT: strh.w r8, [sp, #390]
				; CHECK-NEXT: strd r0, r10, [sp, #24]
	; CHECK-NEXT: vstrw.32 q0, [sp, #44]			; CHECK-NEXT: vstrw.32 q0, [sp, #44]
	; CHECK-NEXT: str r0, [r0]			; CHECK-NEXT: str r0, [r0]
	; CHECK-NEXT: vstrw.32 q2, [r0]			; CHECK-NEXT: vstrw.32 q2, [r0]
	; CHECK-NEXT: vstrw.32 q7, [r0]			; CHECK-NEXT: vstrw.32 q7, [r0]
	; CHECK-NEXT: vstrw.32 q3, [r0]			; CHECK-NEXT: vstrw.32 q3, [r0]
	; CHECK-NEXT: vstrw.32 q1, [r0]			; CHECK-NEXT: vstrw.32 q1, [r0]
	; CHECK-NEXT: bl __aeabi_memclr4			; CHECK-NEXT: bl __aeabi_memclr4
	; CHECK-NEXT: vmov.32 q5[1], r5			; CHECK-NEXT: vmov.32 q5[1], r5
	▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/mve-vst3.ll

	Show All 18 Lines
	; CHECK-NEXT: vmov.f32 s8, s7			; CHECK-NEXT: vmov.f32 s8, s7
	; CHECK-NEXT: vmov.f32 s10, s1			; CHECK-NEXT: vmov.f32 s10, s1
	; CHECK-NEXT: vmov r2, s8			; CHECK-NEXT: vmov r2, s8
	; CHECK-NEXT: vmov r0, s10			; CHECK-NEXT: vmov r0, s10
	; CHECK-NEXT: vmov.f64 d4, d2			; CHECK-NEXT: vmov.f64 d4, d2
	; CHECK-NEXT: vmov.f32 s9, s6			; CHECK-NEXT: vmov.f32 s9, s6
	; CHECK-NEXT: vmov.f32 s10, s0			; CHECK-NEXT: vmov.f32 s10, s0
	; CHECK-NEXT: vmov.f32 s11, s5			; CHECK-NEXT: vmov.f32 s11, s5
	; CHECK-NEXT: strd r2, r0, [r1, #16]
	; CHECK-NEXT: vstrw.32 q2, [r1]			; CHECK-NEXT: vstrw.32 q2, [r1]
				; CHECK-NEXT: strd r2, r0, [r1, #16]
	; CHECK-NEXT: pop {r4, pc}			; CHECK-NEXT: pop {r4, pc}
	entry:			entry:
	%s1 = getelementptr <2 x i32>, <2 x i32>* %src, i32 0			%s1 = getelementptr <2 x i32>, <2 x i32>* %src, i32 0
	%l1 = load <2 x i32>, <2 x i32>* %s1, align 4			%l1 = load <2 x i32>, <2 x i32>* %s1, align 4
	%s2 = getelementptr <2 x i32>, <2 x i32>* %src, i32 1			%s2 = getelementptr <2 x i32>, <2 x i32>* %src, i32 1
	%l2 = load <2 x i32>, <2 x i32>* %s2, align 4			%l2 = load <2 x i32>, <2 x i32>* %s2, align 4
	%s3 = getelementptr <2 x i32>, <2 x i32>* %src, i32 2			%s3 = getelementptr <2 x i32>, <2 x i32>* %src, i32 2
	%l3 = load <2 x i32>, <2 x i32>* %s3, align 4			%l3 = load <2 x i32>, <2 x i32>* %s3, align 4
	▲ Show 20 Lines • Show All 1,892 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/umulo-128-legalisation-lowering.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=thumbv7-unknown-none-gnueabi \| FileCheck %s --check-prefixes=THUMBV7			; RUN: llc < %s -mtriple=thumbv7-unknown-none-gnueabi \| FileCheck %s --check-prefixes=THUMBV7

	define { i128, i8 } @muloti_test(i128 %l, i128 %r) unnamed_addr #0 {			define { i128, i8 } @muloti_test(i128 %l, i128 %r) unnamed_addr #0 {
	; THUMBV7-LABEL: muloti_test:			; THUMBV7-LABEL: muloti_test:
	; THUMBV7: @ %bb.0: @ %start			; THUMBV7: @ %bb.0: @ %start
	; THUMBV7-NEXT: .save {r4, r5, r6, r7, r8, r9, r10, r11, lr}			; THUMBV7-NEXT: .save {r4, r5, r6, r7, r8, r9, r10, r11, lr}
	; THUMBV7-NEXT: push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr}			; THUMBV7-NEXT: push.w {r4, r5, r6, r7, r8, r9, r10, r11, lr}
	; THUMBV7-NEXT: .pad #44			; THUMBV7-NEXT: .pad #44
	; THUMBV7-NEXT: sub sp, #44			; THUMBV7-NEXT: sub sp, #44
	; THUMBV7-NEXT: ldrd r4, r7, [sp, #88]
	; THUMBV7-NEXT: mov r5, r3
	; THUMBV7-NEXT: str r0, [sp, #40] @ 4-byte Spill			; THUMBV7-NEXT: str r0, [sp, #40] @ 4-byte Spill
	; THUMBV7-NEXT: movs r0, #0			; THUMBV7-NEXT: movs r0, #0
	; THUMBV7-NEXT: strd r4, r7, [sp]			; THUMBV7-NEXT: ldrd r4, r7, [sp, #88]
	; THUMBV7-NEXT: mov r1, r3			; THUMBV7-NEXT: mov r5, r3
	; THUMBV7-NEXT: strd r0, r0, [sp, #8]			; THUMBV7-NEXT: strd r0, r0, [sp, #8]
				; THUMBV7-NEXT: mov r1, r3
	; THUMBV7-NEXT: mov r6, r2			; THUMBV7-NEXT: mov r6, r2
	; THUMBV7-NEXT: mov r0, r2			; THUMBV7-NEXT: mov r0, r2
	; THUMBV7-NEXT: movs r2, #0			; THUMBV7-NEXT: movs r2, #0
	; THUMBV7-NEXT: movs r3, #0			; THUMBV7-NEXT: movs r3, #0
				; THUMBV7-NEXT: strd r4, r7, [sp]
	; THUMBV7-NEXT: bl __multi3			; THUMBV7-NEXT: bl __multi3
	; THUMBV7-NEXT: strd r1, r0, [sp, #32] @ 8-byte Folded Spill			; THUMBV7-NEXT: strd r1, r0, [sp, #32] @ 8-byte Folded Spill
	; THUMBV7-NEXT: strd r3, r2, [sp, #24] @ 8-byte Folded Spill			; THUMBV7-NEXT: strd r3, r2, [sp, #24] @ 8-byte Folded Spill
	; THUMBV7-NEXT: ldrd r2, r0, [sp, #96]			; THUMBV7-NEXT: ldrd r2, r0, [sp, #96]
	; THUMBV7-NEXT: ldr.w r9, [sp, #80]			; THUMBV7-NEXT: ldr.w r9, [sp, #80]
	; THUMBV7-NEXT: umull lr, r0, r0, r6			; THUMBV7-NEXT: umull lr, r0, r0, r6
	; THUMBV7-NEXT: ldr.w r11, [sp, #84]			; THUMBV7-NEXT: ldr.w r11, [sp, #84]
	; THUMBV7-NEXT: umull r3, r1, r5, r2			; THUMBV7-NEXT: umull r3, r1, r5, r2
	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/store_op_load_fold2.ll

Show All 11 Lines	cond_true2732.preheader: ; preds = %entry
%tmp2670.us.us = load i64, i64* null ; <i64> [#uses=1]		%tmp2670.us.us = load i64, i64* null ; <i64> [#uses=1]
%shift.upgrd.1 = zext i8 %tmp2674 to i64 ; <i64> [#uses=1]		%shift.upgrd.1 = zext i8 %tmp2674 to i64 ; <i64> [#uses=1]
%tmp2675.us.us = shl i64 %tmp2670.us.us, %shift.upgrd.1 ; <i64> [#uses=1]		%tmp2675.us.us = shl i64 %tmp2670.us.us, %shift.upgrd.1 ; <i64> [#uses=1]
%tmp2675not.us.us = xor i64 %tmp2675.us.us, -1 ; <i64> [#uses=1]		%tmp2675not.us.us = xor i64 %tmp2675.us.us, -1 ; <i64> [#uses=1]
%tmp2676.us.us = and i64 %tmp2667.us.us, %tmp2675not.us.us ; <i64> [#uses=1]		%tmp2676.us.us = and i64 %tmp2667.us.us, %tmp2675not.us.us ; <i64> [#uses=1]
store i64 %tmp2676.us.us, i64* %tmp2666		store i64 %tmp2676.us.us, i64* %tmp2666
ret i32 0		ret i32 0

; INTEL: and {{e..}}, dword ptr [356]		; INTEL-DAG: and {{e..}}, dword ptr [356]
; INTEL: and dword ptr [360], {{e..}}		; INTEL-DAG: and dword ptr [360], {{e..}}
; FIXME: mov dword ptr [356], {{e..}}		; INTEL: mov dword ptr [356], {{e..}}
; The above line comes out as 'mov 360, eax', but when the register is ecx it works?

; ATT: andl 356, %{{e..}}		; ATT-DAG: andl 356, %{{e..}}
; ATT: andl %{{e..}}, 360		; ATT-DAG: andl %{{e..}}, 360
; ATT: movl %{{e..}}, 356		; ATT: movl %{{e..}}, 356

}		}

This is an archive of the discontinued LLVM Phabricator instance.

[MachineInstr] Add support for instructions with multiple memory operands.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 302721

llvm/include/llvm/CodeGen/TargetInstrInfo.h

llvm/lib/CodeGen/MachineInstr.cpp

llvm/test/CodeGen/AArch64/merge-store-dependency.ll

llvm/test/CodeGen/ARM/big-endian-neon-fp16-bitconv.ll

llvm/test/CodeGen/Thumb2/mve-float32regloops.ll

llvm/test/CodeGen/Thumb2/mve-phireg.ll

llvm/test/CodeGen/Thumb2/mve-vst3.ll

llvm/test/CodeGen/Thumb2/umulo-128-legalisation-lowering.ll

llvm/test/CodeGen/X86/store_op_load_fold2.ll

[MachineInstr] Add support for instructions with multiple memory operands.
ClosedPublic