This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
1/2
SelectionDAG.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
1/1
arm64-2012-05-07-MemcpyAlignBug.ll
1/1
arm64-memcpy-inline.ll
1/1
arm64-misaligned-memcpy-inline.ll
-
memcpy-scoped-aa.ll
-
AMDGPU/
-
memcpy-scoped-aa.ll
-
PowerPC/
1/1
pr45301.ll
-
X86/
-
memcpy-scoped-aa.ll

Differential D102255

[SelectionDAG] Generate scoped AA metadata when lowering memcpy.
Needs ReviewPublic

Authored by hliao on May 11 2021, 10:03 AM.

Download Raw Diff

Details

Reviewers

gchatelet
hfinkel
bogner
t.p.northover
nemanjai
nikic
fhahn
jeroen.dobbelaere

Summary

When memcpy is lowered in SelectionDAG, similar to inlining a function, scoped AA metadata should be generated and attached on the lowered loads/stores following 'noalias' arguments.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hliao created this revision.May 11 2021, 10:03 AM

Herald added subscribers: ecnelises, hiraditya, nemanjai. · View Herald TranscriptMay 11 2021, 10:03 AM

hliao requested review of this revision.May 11 2021, 10:03 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2021, 10:03 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B103774: Diff 344452.May 11 2021, 10:04 AM

hliao added reviewers: gchatelet, hfinkel, bogner, t.p.northover, nemanjai.May 11 2021, 10:06 AM

hliao added inline comments.May 11 2021, 10:14 AM

llvm/test/CodeGen/AArch64/arm64-2012-05-07-MemcpyAlignBug.ll
12	With that extra scoped AA info, this store is freely scheduled ahead to minimize register pressure as `cyclone` disables the latency heuristic.
llvm/test/CodeGen/AArch64/arm64-memcpy-inline.ll
33	Similar change due to 'cyclone' schedule favors register pressure over latency.
llvm/test/CodeGen/AArch64/arm64-misaligned-memcpy-inline.ll
36	functionality wise the same. The sequence of offset looks better in terms of the locality.
llvm/test/CodeGen/PowerPC/pr45301.ll
29	The same from the functionality wise but loads from that memcpy are scheduled ahead in the favor of latency hiding.

Do I understand correctly that this patch is trying to claim that the memcpy src and dst do not alias through scoped alias metadata? If so, I'm afraid this is incorrect. Our contract for llvm.memcpy requires that src/dst are either NoAlias or MustAlias (but not PartialAlias). From LangRef:

The ‘llvm.memcpy.*’ intrinsics copy a block of memory from the source location to the destination location, which must either be equal or non-overlapping.

In D102255#2751896, @nikic wrote:

Do I understand correctly that this patch is trying to claim that the memcpy src and dst do not alias through scoped alias metadata? If so, I'm afraid this is incorrect. Our contract for llvm.memcpy requires that src/dst are either NoAlias or MustAlias (but not PartialAlias). From LangRef:

The ‘llvm.memcpy.*’ intrinsics copy a block of memory from the source location to the destination location, which must either be equal or non-overlapping.

The langref says "The ‘llvm.memcpy.*’ intrinsics copy a block of memory from the source location to the destination location, which must either be equal or non-overlapping. It copies “len” bytes of memory over. If the argument is known to be aligned to some boundary, this can be specified as an attribute on the argument.". It says that src and dst are either equal or non-overlapping. For the equal case, they should be eliminated earlier. Even if there are remaining ones, claiming them 'NoAlias' doesn't have correctness issue when copying them. It won't cause correctness issue if we schedule those loads and stores freely.
In addition, the extra AA scope metadata here only has impact on this instance of memcpy.

It does look like a valid way to indicate that the individual loads and stores are independent, except for their value dependency.

Given the very late introduction of new !alias.scope and !noalias metadata, is their a way to have a testcase look at a machine ir dump, together with the metadata output at that phase ?

In D102255#2752039, @jeroen.dobbelaere wrote:

It does look like a valid way to indicate that the individual loads and stores are independent, except for their value dependency.

Given the very late introduction of new !alias.scope and !noalias metadata, is their a way to have a testcase look at a machine ir dump, together with the metadata output at that phase ?

That's on my plan to enable the dumping of scoped AA in MIR. These AA metadata are generated within the backend and need special support to dump them friendly.

gchatelet retitled this revision from [SelectionDAG] Generate scoped AA metadata when loweing memcpy. to [SelectionDAG] Generate scoped AA metadata when lowering memcpy..May 12 2021, 12:27 AM

In D102255#2751961, @hliao wrote:

In D102255#2751896, @nikic wrote:

Do I understand correctly that this patch is trying to claim that the memcpy src and dst do not alias through scoped alias metadata? If so, I'm afraid this is incorrect. Our contract for llvm.memcpy requires that src/dst are either NoAlias or MustAlias (but not PartialAlias). From LangRef:

The ‘llvm.memcpy.*’ intrinsics copy a block of memory from the source location to the destination location, which must either be equal or non-overlapping.

The langref says "The ‘llvm.memcpy.*’ intrinsics copy a block of memory from the source location to the destination location, which must either be equal or non-overlapping. It copies “len” bytes of memory over. If the argument is known to be aligned to some boundary, this can be specified as an attribute on the argument.". It says that src and dst are either equal or non-overlapping. For the equal case, they should be eliminated earlier.

Generally, you do not know whether they are equal or not.

Even if there are remaining ones, claiming them 'NoAlias' doesn't have correctness issue when copying them. It won't cause correctness issue if we schedule those loads and stores freely.

I agree that rescheduling the loads/stores is correct even if src and dst are equal. However, the metadata itself is still incorrect: It will claim that the loads/stores are NoAlias, even though they are actually MustAlias.

I propose to discuss this in tomorrow's aa tech call.

In D102255#2755372, @nikic wrote:

I agree that rescheduling the loads/stores is correct even if src and dst are equal. However, the metadata itself is still incorrect: It will claim that the loads/stores are NoAlias, even though they are actually MustAlias.

The proposed version introduces 2 scopes, assigning all store to one scope and all loads to another.

I order to avoid the MustAlias vs NoAlias problem, we could introduce a scope per load-store pair. Of course, that might be a lot of extra scopes..

In D102255#2765788, @jeroen.dobbelaere wrote:

In D102255#2755372, @nikic wrote:

I agree that rescheduling the loads/stores is correct even if src and dst are equal. However, the metadata itself is still incorrect: It will claim that the loads/stores are NoAlias, even though they are actually MustAlias.

The proposed version introduces 2 scopes, assigning all store to one scope and all loads to another.

I order to avoid the MustAlias vs NoAlias problem, we could introduce a scope per load-store pair. Of course, that might be a lot of extra scopes..

This was discussed earlier today in LLVM's AA Technical call. This method should work, but one possible gotcha came up: sometimes a memcpy lowering results in overlapping load/stores. Those overlapping load/stores must remain 'aliasing', so they should belong to the same scope.

possible example:

// memcpy(dst, src, 23) becomes:
store i64 dst+0, (load i64 src+0)    // scope 0
store i64 dst+8, (load i64, src+8)   // scope 1, not overlapping with previous pair
store i64 dst+15, (load i64, src+15) // also scope 1, overlapping with previous pair

Add MIR printer and parser support for scoped AA metadata generated in the backend.

Herald added subscribers: dexonsmith, kerbowa, pengfei and 3 others. · View Herald TranscriptMay 19 2021, 1:02 PM

Harbormaster completed remote builds in B105291: Diff 346540.May 19 2021, 1:02 PM

In D102255#2766721, @jeroen.dobbelaere wrote:
In D102255#2765788, @jeroen.dobbelaere wrote:

In D102255#2755372, @nikic wrote:

I agree that rescheduling the loads/stores is correct even if src and dst are equal. However, the metadata itself is still incorrect: It will claim that the loads/stores are NoAlias, even though they are actually MustAlias.

The proposed version introduces 2 scopes, assigning all store to one scope and all loads to another.

I order to avoid the MustAlias vs NoAlias problem, we could introduce a scope per load-store pair. Of course, that might be a lot of extra scopes..

This was discussed earlier today in LLVM's AA Technical call. This method should work, but one possible gotcha came up: sometimes a memcpy lowering results in overlapping load/stores. Those overlapping load/stores must remain 'aliasing', so they should belong to the same scope.

possible example:
// memcpy(dst, src, 23) becomes:
store i64 dst+0, (load i64 src+0)    // scope 0
store i64 dst+8, (load i64, src+8)   // scope 1, not overlapping with previous pair
store i64 dst+15, (load i64, src+15) // also scope 1, overlapping with previous pair

Not being able to catch that meeting. I am not sure the current SDAG would generate that. Let me check. If we did that in the current SDAG implementation, we may remove scoped AA metadata for that trailing (or maybe the heading) loads as the conservative solution before we introduce new scopes.

Just read the relevant threads and bugs reported on the change of allowing exact-overlap on llvm.memcpy. See the reference list at the end. Personally, I think it's OK to assume NoAlias added here. By allowing exact-overlap in llvm.memcpy, the most significant change is on the basic-aa, which must consider the case where the source and destination of llvm.memcpy is the same. The make senses at the IR level, where llvm.memcpy is treated as a single op as the exact-overlap means the copy is a no-op and won't always overwrite the destination memory. But, where we lower that copy into loads/stores, we say that loads/stores won't alias. That's fine as the order between loads and stores on the same offset (or location for the exact-overlap case) is established through the data dependency. In addition, the no-alias established here is a scoped one, which only applies to loads/stored from this llvm.memcpy only. It won't affect the AA result between them to loads/stores out of the scope. (This patch depends on https://reviews.llvm.org/D102215, which propagates scoped AA on mem ops into loads/stores after lowering.)

But, there are definitely issues where the backend may split/merge/narrow/widen memory operations. For split and narrow, they seem fine. I started to address the merge issue in https://reviews.llvm.org/D102821 to ensure the correct scoped AA is used on the merged stores or loads. For widen, mostly load widening, we have a quite tricky issue as the widening result, in most cases, doesn't care about extra content loaded from memory. They want to choose the widened one to maximize performance but we have the risk that simply reusing the scoped AA metadata may result in undesired behavior.

https://lists.llvm.org/pipermail/cfe-dev/2020-August/066614.html

PING

Rebase.

In D102255#2769466, @hliao wrote:

Add MIR printer and parser support for scoped AA metadata generated in the backend.

Hi Michael, I think it would be better (easier to review) if you could put this part in a separate patch.

Can you please split off the metadata parsing part into a separate patch? It's not directly related to the memcpy lowering.

Harbormaster completed remote builds in B106143: Diff 347755.May 25 2021, 12:53 PM

In D102255#2780327, @nikic wrote:

Can you please split off the metadata parsing part into a separate patch? It's not directly related to the memcpy lowering.

OK, I need to add unit tests to verify that as we won't be able to generate machine metadata within the current backend. I will prepare that this night.

In D102255#2780461, @hliao wrote:

In D102255#2780327, @nikic wrote:

Can you please split off the metadata parsing part into a separate patch? It's not directly related to the memcpy lowering.

OK, I need to add unit tests to verify that as we won't be able to generate machine metadata within the current backend. I will prepare that this night.

MIR printer change is split into https://reviews.llvm.org/D103205

Clean up after splitting MIR printer and parser changes.

Harbormaster completed remote builds in B106591: Diff 348369.May 27 2021, 1:50 PM

In D102255#2785665, @hliao wrote:

Clean up after splitting MIR printer and parser changes.

here's the printer change (D103205) and parse change (D103282)

Rebase

Harbormaster completed remote builds in B106601: Diff 348385.May 27 2021, 2:54 PM

jeroen.dobbelaere added inline comments.May 27 2021, 11:33 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6398–6421	As mentioned in D102255#2766721, we still would need to use separate scopes for ever load/store pair. That's the only way to avoid a possible 'noalias' result when the source and destination happen to be identical. (based on the AA Techcall discussion, such a situation was considered to be not acceptable).

hliao mentioned this in D103205: [MIRPrinter] Add machine metadata support..May 28 2021, 8:50 AM

Rebase to recent changes on the MIR printer (already committed) and parser (under review).

Harbormaster completed remote builds in B110444: Diff 353707.Jun 22 2021, 11:40 AM

Rebase

hliao added inline comments.Jun 28 2021, 8:10 PM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6398–6421	Separating scopes for each load/store pair make it impossible to schedule loads/stores freely generated from the same memcpy as they won't belong to the same scope. The exact overlap described in memcpy builtin refers to the situation where an optimizer should not assume distinct memory operands from the outside of memcpy. In another perspective, such lowering of memcpy is eqivalent to inlining a memcpy function implementation at the LLVM IR level. Could you explain more about the concern?

Harbormaster completed remote builds in B111427: Diff 355096.Jun 28 2021, 8:36 PM

Rebase and resolve conflicts. Kindly PING for review.

BTW, the newly generated AA metadata only changes the alias checking results among loads/stores generated from this lowered memcpy. It won't change other alias checking result.

Harbormaster completed remote builds in B112673: Diff 356804.Jul 6 2021, 1:13 PM

hliao mentioned this in D105721: [amdgpu] Add scope metadata support for noalias kernel arguments..Jul 9 2021, 12:17 PM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

SelectionDAG.cpp

35 lines

test/

CodeGen/

AArch64/

arm64-2012-05-07-MemcpyAlignBug.ll

2 lines

arm64-memcpy-inline.ll

14 lines

arm64-misaligned-memcpy-inline.ll

16 lines

memcpy-scoped-aa.ll

39 lines

AMDGPU/

memcpy-scoped-aa.ll

26 lines

PowerPC/

pr45301.ll

20 lines

X86/

memcpy-scoped-aa.ll

50 lines

Diff 355096

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DebugInfoMetadata.h"		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/DebugLoc.h"		#include "llvm/IR/DebugLoc.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalValue.h"		#include "llvm/IR/GlobalValue.h"
		#include "llvm/IR/MDBuilder.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/CodeGen.h"		#include "llvm/Support/CodeGen.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
Show All 29 Lines
void SelectionDAG::DAGUpdateListener::NodeDeleted(SDNode, SDNode) {}		void SelectionDAG::DAGUpdateListener::NodeDeleted(SDNode, SDNode) {}
void SelectionDAG::DAGUpdateListener::NodeUpdated(SDNode*) {}		void SelectionDAG::DAGUpdateListener::NodeUpdated(SDNode*) {}
void SelectionDAG::DAGUpdateListener::NodeInserted(SDNode *) {}		void SelectionDAG::DAGUpdateListener::NodeInserted(SDNode *) {}

void SelectionDAG::DAGNodeDeletedListener::anchor() {}		void SelectionDAG::DAGNodeDeletedListener::anchor() {}

#define DEBUG_TYPE "selectiondag"		#define DEBUG_TYPE "selectiondag"

		static cl::opt<bool> EnableMemCpyScopedNoAlias(
		"enable-memcpy-scoped-noalias", cl::Hidden, cl::init(true),
		cl::desc("Enable scoped no-alias support during memcpy lowering"));

static cl::opt<bool> EnableMemCpyDAGOpt("enable-memcpy-dag-opt",		static cl::opt<bool> EnableMemCpyDAGOpt("enable-memcpy-dag-opt",
cl::Hidden, cl::init(true),		cl::Hidden, cl::init(true),
cl::desc("Gang up loads and stores generated by inlining of memcpy"));		cl::desc("Gang up loads and stores generated by inlining of memcpy"));

static cl::opt<int> MaxLdStGlue("ldstmemcpy-glue-max",		static cl::opt<int> MaxLdStGlue("ldstmemcpy-glue-max",
cl::desc("Number limit for gluing ld/st of memcpy."),		cl::desc("Number limit for gluing ld/st of memcpy."),
cl::Hidden, cl::init(0));		cl::Hidden, cl::init(0));

▲ Show 20 Lines • Show All 6,277 Lines • ▼ Show 20 Lines	if (NewAlign > Alignment) {
Alignment = NewAlign;		Alignment = NewAlign;
}		}
}		}

// Prepare AAInfo for loads/stores after lowering this memcpy.		// Prepare AAInfo for loads/stores after lowering this memcpy.
AAMDNodes NewAAInfo = AAInfo;		AAMDNodes NewAAInfo = AAInfo;
NewAAInfo.TBAA = NewAAInfo.TBAAStruct = nullptr;		NewAAInfo.TBAA = NewAAInfo.TBAAStruct = nullptr;

		AAMDNodes DstAAInfo, SrcAAInfo;
		DstAAInfo = SrcAAInfo = NewAAInfo;
		// Generate new scoped AA metadata for this memcpy instance if enabled.
		if (EnableMemCpyScopedNoAlias) {
		MDBuilder MDB(*DAG.getContext());
		MDNode *Domain =
		MDB.createAnonymousAliasScopeDomain("MemcpyLoweringDomain");
		MDNode *DstScope = MDB.createAnonymousAliasScope(Domain, "Dst");
		MDNode *SrcScope = MDB.createAnonymousAliasScope(Domain, "Src");
		MDNode *DstAliasScope = MDNode::concatenate(
		NewAAInfo.Scope, MDNode::get(*DAG.getContext(), {DstScope}));
		MDNode *DstNoAliase = MDNode::concatenate(
		NewAAInfo.NoAlias, MDNode::get(*DAG.getContext(), {SrcScope}));
		MDNode *SrcAliasScope = MDNode::concatenate(
		NewAAInfo.Scope, MDNode::get(*DAG.getContext(), {SrcScope}));
		MDNode *SrcNoAliase = MDNode::concatenate(
		NewAAInfo.NoAlias, MDNode::get(*DAG.getContext(), {DstScope}));

		DstAAInfo.Scope = DstAliasScope;
		DstAAInfo.NoAlias = DstNoAliase;
		SrcAAInfo.Scope = SrcAliasScope;
		SrcAAInfo.NoAlias = SrcNoAliase;
		}

		jeroen.dobbelaereUnsubmitted Not Done Reply Inline Actions As mentioned in D102255#2766721, we still would need to use separate scopes for ever load/store pair. That's the only way to avoid a possible 'noalias' result when the source and destination happen to be identical. (based on the AA Techcall discussion, such a situation was considered to be not acceptable). jeroen.dobbelaere: As mentioned in D102255#2766721, we still would need to use separate scopes for ever…
		hliaoAuthorUnsubmitted Done Reply Inline Actions Separating scopes for each load/store pair make it impossible to schedule loads/stores freely generated from the same memcpy as they won't belong to the same scope. The exact overlap described in memcpy builtin refers to the situation where an optimizer should not assume distinct memory operands from the outside of memcpy. In another perspective, such lowering of memcpy is eqivalent to inlining a memcpy function implementation at the LLVM IR level. Could you explain more about the concern? hliao: Separating scopes for each load/store pair make it impossible to schedule loads/stores freely…
MachineMemOperand::Flags MMOFlags =		MachineMemOperand::Flags MMOFlags =
isVol ? MachineMemOperand::MOVolatile : MachineMemOperand::MONone;		isVol ? MachineMemOperand::MOVolatile : MachineMemOperand::MONone;
SmallVector<SDValue, 16> OutLoadChains;		SmallVector<SDValue, 16> OutLoadChains;
SmallVector<SDValue, 16> OutStoreChains;		SmallVector<SDValue, 16> OutStoreChains;
SmallVector<SDValue, 32> OutChains;		SmallVector<SDValue, 32> OutChains;
unsigned NumMemOps = MemOps.size();		unsigned NumMemOps = MemOps.size();
uint64_t SrcOff = 0, DstOff = 0;		uint64_t SrcOff = 0, DstOff = 0;
for (unsigned i = 0; i != NumMemOps; ++i) {		for (unsigned i = 0; i != NumMemOps; ++i) {
Show All 26 Lines	if (CopyFromConstant &&
SubSlice.Offset = 0;		SubSlice.Offset = 0;
SubSlice.Length = VTSize;		SubSlice.Length = VTSize;
}		}
Value = getMemsetStringVal(VT, dl, DAG, TLI, SubSlice);		Value = getMemsetStringVal(VT, dl, DAG, TLI, SubSlice);
if (Value.getNode()) {		if (Value.getNode()) {
Store = DAG.getStore(		Store = DAG.getStore(
Chain, dl, Value,		Chain, dl, Value,
DAG.getMemBasePlusOffset(Dst, TypeSize::Fixed(DstOff), dl),		DAG.getMemBasePlusOffset(Dst, TypeSize::Fixed(DstOff), dl),
DstPtrInfo.getWithOffset(DstOff), Alignment, MMOFlags, NewAAInfo);		DstPtrInfo.getWithOffset(DstOff), Alignment, MMOFlags, DstAAInfo);
OutChains.push_back(Store);		OutChains.push_back(Store);
}		}
}		}

if (!Store.getNode()) {		if (!Store.getNode()) {
// The type might not be legal for the target. This should only happen		// The type might not be legal for the target. This should only happen
// if the type is smaller than a legal type, as on PPC, so the right		// if the type is smaller than a legal type, as on PPC, so the right
// thing to do is generate a LoadExt/StoreTrunc pair. These simplify		// thing to do is generate a LoadExt/StoreTrunc pair. These simplify
// to Load/Store if NVT==VT.		// to Load/Store if NVT==VT.
// FIXME does the case above also need this?		// FIXME does the case above also need this?
EVT NVT = TLI.getTypeToTransformTo(C, VT);		EVT NVT = TLI.getTypeToTransformTo(C, VT);
assert(NVT.bitsGE(VT));		assert(NVT.bitsGE(VT));

bool isDereferenceable =		bool isDereferenceable =
SrcPtrInfo.getWithOffset(SrcOff).isDereferenceable(VTSize, C, DL);		SrcPtrInfo.getWithOffset(SrcOff).isDereferenceable(VTSize, C, DL);
MachineMemOperand::Flags SrcMMOFlags = MMOFlags;		MachineMemOperand::Flags SrcMMOFlags = MMOFlags;
if (isDereferenceable)		if (isDereferenceable)
SrcMMOFlags \|= MachineMemOperand::MODereferenceable;		SrcMMOFlags \|= MachineMemOperand::MODereferenceable;

Value = DAG.getExtLoad(		Value = DAG.getExtLoad(
ISD::EXTLOAD, dl, NVT, Chain,		ISD::EXTLOAD, dl, NVT, Chain,
DAG.getMemBasePlusOffset(Src, TypeSize::Fixed(SrcOff), dl),		DAG.getMemBasePlusOffset(Src, TypeSize::Fixed(SrcOff), dl),
SrcPtrInfo.getWithOffset(SrcOff), VT,		SrcPtrInfo.getWithOffset(SrcOff), VT,
commonAlignment(*SrcAlign, SrcOff), SrcMMOFlags, NewAAInfo);		commonAlignment(*SrcAlign, SrcOff), SrcMMOFlags, SrcAAInfo);
OutLoadChains.push_back(Value.getValue(1));		OutLoadChains.push_back(Value.getValue(1));

Store = DAG.getTruncStore(		Store = DAG.getTruncStore(
Chain, dl, Value,		Chain, dl, Value,
DAG.getMemBasePlusOffset(Dst, TypeSize::Fixed(DstOff), dl),		DAG.getMemBasePlusOffset(Dst, TypeSize::Fixed(DstOff), dl),
DstPtrInfo.getWithOffset(DstOff), VT, Alignment, MMOFlags, NewAAInfo);		DstPtrInfo.getWithOffset(DstOff), VT, Alignment, MMOFlags, DstAAInfo);
OutStoreChains.push_back(Store);		OutStoreChains.push_back(Store);
}		}
SrcOff += VTSize;		SrcOff += VTSize;
DstOff += VTSize;		DstOff += VTSize;
Size -= VTSize;		Size -= VTSize;
}		}

unsigned GluedLdStLimit = MaxLdStGlue == 0 ?		unsigned GluedLdStLimit = MaxLdStGlue == 0 ?
▲ Show 20 Lines • Show All 4,126 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-2012-05-07-MemcpyAlignBug.ll

	; RUN: llc < %s -mtriple=arm64-eabi -mcpu=cyclone \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-eabi -mcpu=cyclone \| FileCheck %s
	; <rdar://problem/11294426>			; <rdar://problem/11294426>

	@b = private unnamed_addr constant [3 x i32] [i32 1768775988, i32 1685481784, i32 1836253201], align 4			@b = private unnamed_addr constant [3 x i32] [i32 1768775988, i32 1685481784, i32 1836253201], align 4

	; The important thing for this test is that we need an unaligned load of `l_b'			; The important thing for this test is that we need an unaligned load of `l_b'
	; ("ldr w2, [x1, #8]" in this case).			; ("ldr w2, [x1, #8]" in this case).

	; CHECK: adrp x[[PAGE:[0-9]+]], {{l_b@PAGE\|.Lb}}			; CHECK: adrp x[[PAGE:[0-9]+]], {{l_b@PAGE\|.Lb}}
	; CHECK: add x[[ADDR:[0-9]+]], x[[PAGE]], {{l_b@PAGEOFF\|:lo12:.Lb}}			; CHECK: add x[[ADDR:[0-9]+]], x[[PAGE]], {{l_b@PAGEOFF\|:lo12:.Lb}}
	; CHECK-NEXT: ldr [[VAL2:x[0-9]+]], [x[[ADDR]]]			; CHECK-NEXT: ldr [[VAL2:x[0-9]+]], [x[[ADDR]]]
				; CHECK-NEXT: str [[VAL2]], [x0]
				hliaoAuthorUnsubmitted Done Reply Inline Actions With that extra scoped AA info, this store is freely scheduled ahead to minimize register pressure as `cyclone` disables the latency heuristic. hliao: With that extra scoped AA info, this store is freely scheduled ahead to minimize register…
	; CHECK-NEXT: ldr [[VAL:w[0-9]+]], [x[[ADDR]], #8]			; CHECK-NEXT: ldr [[VAL:w[0-9]+]], [x[[ADDR]], #8]
	; CHECK-NEXT: str [[VAL]], [x0, #8]			; CHECK-NEXT: str [[VAL]], [x0, #8]
	; CHECK-NEXT: str [[VAL2]], [x0]

	define void @foo(i8* %a) {			define void @foo(i8* %a) {
	call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a, i8* align 4 bitcast ([3 x i32]* @b to i8*), i64 12, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a, i8* align 4 bitcast ([3 x i32]* @b to i8*), i64 12, i1 false)
	ret void			ret void
	}			}

	declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind			declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind

llvm/test/CodeGen/AArch64/arm64-memcpy-inline.ll

	Show All 21 Lines
	; CHECK-DAG: str [[REG2]],			; CHECK-DAG: str [[REG2]],
	call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 getelementptr inbounds (%struct.x, %struct.x* @dst, i32 0, i32 0), i8* align 8 getelementptr inbounds (%struct.x, %struct.x* @src, i32 0, i32 0), i32 11, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 8 getelementptr inbounds (%struct.x, %struct.x* @dst, i32 0, i32 0), i8* align 8 getelementptr inbounds (%struct.x, %struct.x* @src, i32 0, i32 0), i32 11, i1 false)
	ret i32 0			ret i32 0
	}			}

	define void @t1(i8* nocapture %C) nounwind {			define void @t1(i8* nocapture %C) nounwind {
	entry:			entry:
	; CHECK-LABEL: t1:			; CHECK-LABEL: t1:
	; CHECK: ldr [[DEST:q[0-9]+]], [x[[BASEREG]]]			; CHECK: ldr [[REG0:q[0-9]+]], [x[[BASEREG:[0-9]+]]]
	; CHECK: ldur [[DEST:q[0-9]+]], [x[[BASEREG:[0-9]+]], #15]			; CHECK: str [[REG0]], [x0]
	; CHECK: stur [[DEST:q[0-9]+]], [x0, #15]			; CHECK: ldur [[REG1:q[0-9]+]], [x[[BASEREG]], #15]
	; CHECK: str [[DEST:q[0-9]+]], [x0]			; CHECK: stur [[REG1]], [x0, #15]
				hliaoAuthorUnsubmitted Done Reply Inline Actions Similar change due to 'cyclone' schedule favors register pressure over latency. hliao: Similar change due to 'cyclone' schedule favors register pressure over latency.
	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([31 x i8], [31 x i8]* @.str1, i64 0, i64 0), i64 31, i1 false)			tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([31 x i8], [31 x i8]* @.str1, i64 0, i64 0), i64 31, i1 false)
	ret void			ret void
	}			}

	define void @t2(i8* nocapture %C) nounwind {			define void @t2(i8* nocapture %C) nounwind {
	entry:			entry:
	; CHECK-LABEL: t2:			; CHECK-LABEL: t2:
	; CHECK: mov [[REG3:w[0-9]+]]			; CHECK: mov [[REG3:w[0-9]+]]
	; CHECK: movk [[REG3]],			; CHECK: movk [[REG3]],
	; CHECK: str [[REG3]], [x0, #32]			; CHECK: str [[REG3]], [x0, #32]
	; CHECK: ldp [[DEST1:q[0-9]+]], [[DEST2:q[0-9]+]], [x{{[0-9]+}}]			; CHECK: ldp [[DEST1:q[0-9]+]], [[DEST2:q[0-9]+]], [x{{[0-9]+}}]
	; CHECK: stp [[DEST1]], [[DEST2]], [x0]			; CHECK: stp [[DEST1]], [[DEST2]], [x0]
	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([36 x i8], [36 x i8]* @.str2, i64 0, i64 0), i64 36, i1 false)			tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([36 x i8], [36 x i8]* @.str2, i64 0, i64 0), i64 36, i1 false)
	ret void			ret void
	}			}

	define void @t3(i8* nocapture %C) nounwind {			define void @t3(i8* nocapture %C) nounwind {
	entry:			entry:
	; CHECK-LABEL: t3:			; CHECK-LABEL: t3:
	; CHECK: ldr [[DEST:q[0-9]+]], [x[[BASEREG]]]			; CHECK: ldr [[DEST:q[0-9]+]], [x[[BASEREG:[0-9]+]]]
	; CHECK: ldr [[REG4:x[0-9]+]], [x[[BASEREG:[0-9]+]], #16]
	; CHECK: str [[REG4]], [x0, #16]
	; CHECK: str [[DEST]], [x0]			; CHECK: str [[DEST]], [x0]
				; CHECK: ldr [[REG4:x[0-9]+]], [x[[BASEREG]], #16]
				; CHECK: str [[REG4]], [x0, #16]
	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([24 x i8], [24 x i8]* @.str3, i64 0, i64 0), i64 24, i1 false)			tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([24 x i8], [24 x i8]* @.str3, i64 0, i64 0), i64 24, i1 false)
	ret void			ret void
	}			}

	define void @t4(i8* nocapture %C) nounwind {			define void @t4(i8* nocapture %C) nounwind {
	entry:			entry:
	; CHECK-LABEL: t4:			; CHECK-LABEL: t4:
	; CHECK: mov [[REG5:w[0-9]+]], #32			; CHECK: mov [[REG5:w[0-9]+]], #32
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-misaligned-memcpy-inline.ll

Show All 20 Lines	entry:
call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %out, i8* align 8 %in, i64 16, i1 false)		call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %out, i8* align 8 %in, i64 16, i1 false)
ret void		ret void
}		}

; Tiny (4 bytes here) unaligned memcpy() should be inlined with byte sized		; Tiny (4 bytes here) unaligned memcpy() should be inlined with byte sized
; loads and stores if strict-alignment is turned on.		; loads and stores if strict-alignment is turned on.
define void @t2(i8* %out, i8* %in) {		define void @t2(i8* %out, i8* %in) {
; CHECK-LABEL: t2:		; CHECK-LABEL: t2:
; CHECK: ldrb w{{[0-9]+}}, [x1, #3]		; CHECK: ldrb w[[V0:[0-9]+]], [x1]
; CHECK-NEXT: ldrb w{{[0-9]+}}, [x1, #2]		; CHECK-NEXT: ldrb w[[V1:[0-9]+]], [x1, #1]
; CHECK-NEXT: ldrb w{{[0-9]+}}, [x1, #1]		; CHECK-NEXT: ldrb w[[V2:[0-9]+]], [x1, #2]
; CHECK-NEXT: ldrb w{{[0-9]+}}, [x1]		; CHECK-NEXT: ldrb w[[V3:[0-9]+]], [x1, #3]
; CHECK-NEXT: strb w{{[0-9]+}}, [x0, #3]		; CHECK-NEXT: strb w[[V0]], [x0]
; CHECK-NEXT: strb w{{[0-9]+}}, [x0, #2]		; CHECK-NEXT: strb w[[V1]], [x0, #1]
; CHECK-NEXT: strb w{{[0-9]+}}, [x0, #1]		; CHECK-NEXT: strb w[[V2]], [x0, #2]
; CHECK-NEXT: strb w{{[0-9]+}}, [x0]		; CHECK-NEXT: strb w[[V3]], [x0, #3]
		hliaoAuthorUnsubmitted Done Reply Inline Actions functionality wise the same. The sequence of offset looks better in terms of the locality. hliao: functionality wise the same. The sequence of offset looks better in terms of the locality.
entry:		entry:
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %out, i8* %in, i64 4, i1 false)		call void @llvm.memcpy.p0i8.p0i8.i64(i8* %out, i8* %in, i64 4, i1 false)
ret void		ret void
}		}

declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture readonly, i64, i1)		declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture readonly, i64, i1)

llvm/test/CodeGen/AArch64/memcpy-scoped-aa.ll

; RUN: llc -mtriple=aarch64-linux-gnu -o - %s \| FileCheck %s		; RUN: llc -mtriple=aarch64-linux-gnu -o - %s \| FileCheck %s
; RUN: llc -mtriple=aarch64-linux-gnu -stop-after=finalize-isel -o - %s \| FileCheck --check-prefix=MIR %s		; RUN: llc -mtriple=aarch64-linux-gnu -stop-after=finalize-isel -o - %s \| FileCheck --check-prefix=MIR %s

; MIR-DAG: ![[DOMAIN:[0-9]+]] = distinct !{!{{[0-9]+}}, !"bax"}		; MIR-DAG: ![[DOMAIN:[0-9]+]] = distinct !{!{{[0-9]+}}, !"bax"}
; MIR-DAG: ![[SCOPE0:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[DOMAIN]], !"bax: %p"}		; MIR-DAG: ![[SCOPE0:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[DOMAIN]], !"bax: %p"}
; MIR-DAG: ![[SCOPE1:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[DOMAIN]], !"bax: %q"}		; MIR-DAG: ![[SCOPE1:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[DOMAIN]], !"bax: %q"}
; MIR-DAG: ![[SET0:[0-9]+]] = !{![[SCOPE0]]}		; MIR-DAG: ![[SET0:[0-9]+]] = !{![[SCOPE0]]}
; MIR-DAG: ![[SET1:[0-9]+]] = !{![[SCOPE1]]}		; MIR-DAG: ![[SET1:[0-9]+]] = !{![[SCOPE1]]}

; MIR-LABEL: name: test_memcpy		; MIR-LABEL: name: test_memcpy
; MIR: %2:fpr128 = LDRQui %0, 1 :: (load 16 from %ir.p1, align 4, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR: machineMetadataNodes:
; MIR-NEXT: STRQui killed %2, %0, 0 :: (store 16 into %ir.p0, align 4, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR-DAG: ![[MMDOMAIN:[0-9]+]] = distinct !{!{{[0-9]+}}, !"MemcpyLoweringDomain"}
		; MIR-DAG: ![[MMSCOPE0:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[MMDOMAIN]], !"Src"}
		; MIR-DAG: ![[MMSCOPE1:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[MMDOMAIN]], !"Dst"}
		; MIR-DAG: ![[MMSET0:[0-9]+]] = !{![[SCOPE0]], ![[MMSCOPE0]]}
		; MIR-DAG: ![[MMSET1:[0-9]+]] = !{![[SCOPE0]], ![[MMSCOPE1]]}
		; MIR-DAG: ![[MMSET2:[0-9]+]] = !{![[SCOPE1]], ![[MMSCOPE1]]}
		; MIR-DAG: ![[MMSET3:[0-9]+]] = !{![[SCOPE1]], ![[MMSCOPE0]]}
		; MIR: body:
		; MIR: %2:fpr128 = LDRQui %0, 1 :: (load 16 from %ir.p1, align 4, !alias.scope ![[MMSET0]], !noalias ![[MMSET2]])
		; MIR-NEXT: STRQui killed %2, %0, 0 :: (store 16 into %ir.p0, align 4, !alias.scope ![[MMSET1]], !noalias ![[MMSET3]])
define i32 @test_memcpy(i32* nocapture %p, i32* nocapture readonly %q) {		define i32 @test_memcpy(i32* nocapture %p, i32* nocapture readonly %q) {
; CHECK-LABEL: test_memcpy:		; CHECK-LABEL: test_memcpy:
; CHECK-DAG: ldp [[Q0:w[0-9]+]], [[Q1:w[0-9]+]], [x1]		; CHECK-DAG: ldp [[Q0:w[0-9]+]], [[Q1:w[0-9]+]], [x1]
; CHECK-DAG: ldr [[PVAL:q[0-9]+]], [x0, #16]		; CHECK-DAG: ldr [[PVAL:q[0-9]+]], [x0, #16]
; CHECK-DAG: add w8, [[Q0]], [[Q1]]		; CHECK-DAG: add w8, [[Q0]], [[Q1]]
; CHECK: str [[PVAL]], [x0]		; CHECK: str [[PVAL]], [x0]
; CHECK: mov w0, w8		; CHECK: mov w0, w8
; CHECK: ret		; CHECK: ret
%p0 = bitcast i32* %p to i8*		%p0 = bitcast i32* %p to i8*
%add.ptr = getelementptr inbounds i32, i32* %p, i64 4		%add.ptr = getelementptr inbounds i32, i32* %p, i64 4
%p1 = bitcast i32* %add.ptr to i8*		%p1 = bitcast i32* %add.ptr to i8*
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(16) %p0, i8* noundef nonnull align 4 dereferenceable(16) %p1, i64 16, i1 false), !alias.scope !2, !noalias !4		tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(16) %p0, i8* noundef nonnull align 4 dereferenceable(16) %p1, i64 16, i1 false), !alias.scope !2, !noalias !4
%v0 = load i32, i32* %q, align 4, !alias.scope !4, !noalias !2		%v0 = load i32, i32* %q, align 4, !alias.scope !4, !noalias !2
%q1 = getelementptr inbounds i32, i32* %q, i64 1		%q1 = getelementptr inbounds i32, i32* %q, i64 1
%v1 = load i32, i32* %q1, align 4, !alias.scope !4, !noalias !2		%v1 = load i32, i32* %q1, align 4, !alias.scope !4, !noalias !2
%add = add i32 %v0, %v1		%add = add i32 %v0, %v1
ret i32 %add		ret i32 %add
}		}

; MIR-LABEL: name: test_memcpy_inline		; MIR-LABEL: name: test_memcpy_inline
; MIR: %2:fpr128 = LDRQui %0, 1 :: (load 16 from %ir.p1, align 4, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR: machineMetadataNodes:
; MIR-NEXT: STRQui killed %2, %0, 0 :: (store 16 into %ir.p0, align 4, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR-DAG: ![[MMDOMAIN:[0-9]+]] = distinct !{!{{[0-9]+}}, !"MemcpyLoweringDomain"}
		; MIR-DAG: ![[MMSCOPE0:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[MMDOMAIN]], !"Src"}
		; MIR-DAG: ![[MMSCOPE1:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[MMDOMAIN]], !"Dst"}
		; MIR-DAG: ![[MMSET0:[0-9]+]] = !{![[SCOPE0]], ![[MMSCOPE0]]}
		; MIR-DAG: ![[MMSET1:[0-9]+]] = !{![[SCOPE0]], ![[MMSCOPE1]]}
		; MIR-DAG: ![[MMSET2:[0-9]+]] = !{![[SCOPE1]], ![[MMSCOPE1]]}
		; MIR-DAG: ![[MMSET3:[0-9]+]] = !{![[SCOPE1]], ![[MMSCOPE0]]}
		; MIR: body:
		; MIR: %2:fpr128 = LDRQui %0, 1 :: (load 16 from %ir.p1, align 4, !alias.scope ![[MMSET0]], !noalias ![[MMSET2]])
		; MIR-NEXT: STRQui killed %2, %0, 0 :: (store 16 into %ir.p0, align 4, !alias.scope ![[MMSET1]], !noalias ![[MMSET3]])
define i32 @test_memcpy_inline(i32* nocapture %p, i32* nocapture readonly %q) {		define i32 @test_memcpy_inline(i32* nocapture %p, i32* nocapture readonly %q) {
; CHECK-LABEL: test_memcpy_inline:		; CHECK-LABEL: test_memcpy_inline:
; CHECK-DAG: ldp [[Q0:w[0-9]+]], [[Q1:w[0-9]+]], [x1]		; CHECK-DAG: ldp [[Q0:w[0-9]+]], [[Q1:w[0-9]+]], [x1]
; CHECK-DAG: ldr [[PVAL:q[0-9]+]], [x0, #16]		; CHECK-DAG: ldr [[PVAL:q[0-9]+]], [x0, #16]
; CHECK-DAG: add w8, [[Q0]], [[Q1]]		; CHECK-DAG: add w8, [[Q0]], [[Q1]]
; CHECK: str [[PVAL]], [x0]		; CHECK: str [[PVAL]], [x0]
; CHECK: mov w0, w8		; CHECK: mov w0, w8
; CHECK: ret		; CHECK: ret
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	; CHECK: ret
%v0 = load i32, i32* %q, align 4, !alias.scope !4, !noalias !2		%v0 = load i32, i32* %q, align 4, !alias.scope !4, !noalias !2
%q1 = getelementptr inbounds i32, i32* %q, i64 1		%q1 = getelementptr inbounds i32, i32* %q, i64 1
%v1 = load i32, i32* %q1, align 4, !alias.scope !4, !noalias !2		%v1 = load i32, i32* %q1, align 4, !alias.scope !4, !noalias !2
%add = add i32 %v0, %v1		%add = add i32 %v0, %v1
ret i32 %add		ret i32 %add
}		}

; MIR-LABEL: name: test_mempcpy		; MIR-LABEL: name: test_mempcpy
; MIR: %2:fpr128 = LDRQui %0, 1 :: (load 16 from %ir.p1, align 1, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR: machineMetadataNodes:
; MIR-NEXT: STRQui killed %2, %0, 0 :: (store 16 into %ir.p0, align 1, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR-DAG: ![[MMDOMAIN:[0-9]+]] = distinct !{!{{[0-9]+}}, !"MemcpyLoweringDomain"}
		; MIR-DAG: ![[MMSCOPE0:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[MMDOMAIN]], !"Src"}
		; MIR-DAG: ![[MMSCOPE1:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[MMDOMAIN]], !"Dst"}
		; MIR-DAG: ![[MMSET0:[0-9]+]] = !{![[SCOPE0]], ![[MMSCOPE0]]}
		; MIR-DAG: ![[MMSET1:[0-9]+]] = !{![[SCOPE0]], ![[MMSCOPE1]]}
		; MIR-DAG: ![[MMSET2:[0-9]+]] = !{![[SCOPE1]], ![[MMSCOPE1]]}
		; MIR-DAG: ![[MMSET3:[0-9]+]] = !{![[SCOPE1]], ![[MMSCOPE0]]}
		; MIR: body:
		; MIR: %2:fpr128 = LDRQui %0, 1 :: (load 16 from %ir.p1, align 1, !alias.scope ![[MMSET0]], !noalias ![[MMSET2]])
		; MIR-NEXT: STRQui killed %2, %0, 0 :: (store 16 into %ir.p0, align 1, !alias.scope ![[MMSET1]], !noalias ![[MMSET3]])
define i32 @test_mempcpy(i32* nocapture %p, i32* nocapture readonly %q) {		define i32 @test_mempcpy(i32* nocapture %p, i32* nocapture readonly %q) {
; CHECK-LABEL: test_mempcpy:		; CHECK-LABEL: test_mempcpy:
; CHECK-DAG: ldp [[Q0:w[0-9]+]], [[Q1:w[0-9]+]], [x1]		; CHECK-DAG: ldp [[Q0:w[0-9]+]], [[Q1:w[0-9]+]], [x1]
; CHECK-DAG: ldr [[PVAL:q[0-9]+]], [x0, #16]		; CHECK-DAG: ldr [[PVAL:q[0-9]+]], [x0, #16]
; CHECK-DAG: add w8, [[Q0]], [[Q1]]		; CHECK-DAG: add w8, [[Q0]], [[Q1]]
; CHECK: str [[PVAL]], [x0]		; CHECK: str [[PVAL]], [x0]
; CHECK: mov w0, w8		; CHECK: mov w0, w8
; CHECK: ret		; CHECK: ret
Show All 23 Lines

llvm/test/CodeGen/AMDGPU/memcpy-scoped-aa.ll

	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -o - %s \| FileCheck %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -o - %s \| FileCheck %s
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -stop-after=finalize-isel -o - %s \| FileCheck --check-prefix=MIR %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -stop-after=finalize-isel -o - %s \| FileCheck --check-prefix=MIR %s

	; Ensure that the scoped AA is attached on loads/stores lowered from mem ops.			; Ensure that the scoped AA is attached on loads/stores lowered from mem ops.

	; Re-evaluate the slot numbers of scopes as that numbering could be changed run-by-run.			; Re-evaluate the slot numbers of scopes as that numbering could be changed run-by-run.

	; MIR-DAG: ![[DOMAIN:[0-9]+]] = distinct !{!{{[0-9]+}}, !"bax"}			; MIR-DAG: ![[DOMAIN:[0-9]+]] = distinct !{!{{[0-9]+}}, !"bax"}
	; MIR-DAG: ![[SCOPE0:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[DOMAIN]], !"bax: %p"}			; MIR-DAG: ![[SCOPE0:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[DOMAIN]], !"bax: %p"}
	; MIR-DAG: ![[SCOPE1:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[DOMAIN]], !"bax: %q"}			; MIR-DAG: ![[SCOPE1:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[DOMAIN]], !"bax: %q"}
	; MIR-DAG: ![[SET0:[0-9]+]] = !{![[SCOPE0]]}			; MIR-DAG: ![[SET0:[0-9]+]] = !{![[SCOPE0]]}
	; MIR-DAG: ![[SET1:[0-9]+]] = !{![[SCOPE1]]}			; MIR-DAG: ![[SET1:[0-9]+]] = !{![[SCOPE1]]}

	; MIR-LABEL: name: test_memcpy			; MIR-LABEL: name: test_memcpy
	; MIR: %8:vreg_128 = GLOBAL_LOAD_DWORDX4 %9, 16, 0, implicit $exec :: (load 16 from %ir.p1, align 4, !alias.scope ![[SET0]], !noalias ![[SET1]], addrspace 1)			; MIR: machineMetadataNodes:
	; MIR: GLOBAL_STORE_DWORDX4 %10, killed %8, 0, 0, implicit $exec :: (store 16 into %ir.p0, align 4, !alias.scope ![[SET0]], !noalias ![[SET1]], addrspace 1)			; MIR-DAG: ![[MMDOMAIN:[0-9]+]] = distinct !{!{{[0-9]+}}, !"MemcpyLoweringDomain"}
				; MIR-DAG: ![[MMSCOPE0:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[MMDOMAIN]], !"Src"}
				; MIR-DAG: ![[MMSCOPE1:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[MMDOMAIN]], !"Dst"}
				; MIR-DAG: ![[MMSET0:[0-9]+]] = !{![[SCOPE0]], ![[MMSCOPE0]]}
				; MIR-DAG: ![[MMSET1:[0-9]+]] = !{![[SCOPE0]], ![[MMSCOPE1]]}
				; MIR-DAG: ![[MMSET2:[0-9]+]] = !{![[SCOPE1]], ![[MMSCOPE1]]}
				; MIR-DAG: ![[MMSET3:[0-9]+]] = !{![[SCOPE1]], ![[MMSCOPE0]]}
				; MIR: body:
				; MIR: %8:vreg_128 = GLOBAL_LOAD_DWORDX4 %9, 16, 0, implicit $exec :: (load 16 from %ir.p1, align 4, !alias.scope ![[MMSET0]], !noalias ![[MMSET2]], addrspace 1)
				; MIR: GLOBAL_STORE_DWORDX4 %10, killed %8, 0, 0, implicit $exec :: (store 16 into %ir.p0, align 4, !alias.scope ![[MMSET1]], !noalias ![[MMSET3]], addrspace 1)
	define i32 @test_memcpy(i32 addrspace(1)* nocapture %p, i32 addrspace(1)* nocapture readonly %q) {			define i32 @test_memcpy(i32 addrspace(1)* nocapture %p, i32 addrspace(1)* nocapture readonly %q) {
	; Check loads of %q are scheduled ahead of that store of the memcpy on %p.			; Check loads of %q are scheduled ahead of that store of the memcpy on %p.
	; CHECK-LABEL: test_memcpy:			; CHECK-LABEL: test_memcpy:
	; CHECK-DAG: global_load_dwordx2 v{{\[}}[[Q0:[0-9]+]]:[[Q1:[0-9]+]]{{\]}}, v[2:3], off			; CHECK-DAG: global_load_dwordx2 v{{\[}}[[Q0:[0-9]+]]:[[Q1:[0-9]+]]{{\]}}, v[2:3], off
	; CHECK-DAG: global_load_dwordx4 [[PVAL:v\[[0-9]+:[0-9]+\]]], v[0:1], off offset:16			; CHECK-DAG: global_load_dwordx4 [[PVAL:v\[[0-9]+:[0-9]+\]]], v[0:1], off offset:16
	; CHECK-DAG: v_add_nc_u32_e32 v{{[0-9]+}}, v[[Q0]], v[[Q1]]			; CHECK-DAG: v_add_nc_u32_e32 v{{[0-9]+}}, v[[Q0]], v[[Q1]]
	; CHECK: global_store_dwordx4 v[0:1], [[PVAL]], off			; CHECK: global_store_dwordx4 v[0:1], [[PVAL]], off
	; CHECK: s_setpc_b64 s[30:31]			; CHECK: s_setpc_b64 s[30:31]
	%p0 = bitcast i32 addrspace(1)* %p to i8 addrspace(1)*			%p0 = bitcast i32 addrspace(1)* %p to i8 addrspace(1)*
	%add.ptr = getelementptr inbounds i32, i32 addrspace(1)* %p, i64 4			%add.ptr = getelementptr inbounds i32, i32 addrspace(1)* %p, i64 4
	%p1 = bitcast i32 addrspace(1)* %add.ptr to i8 addrspace(1)*			%p1 = bitcast i32 addrspace(1)* %add.ptr to i8 addrspace(1)*
	tail call void @llvm.memcpy.p1i8.p1i8.i64(i8 addrspace(1)* noundef nonnull align 4 dereferenceable(16) %p0, i8 addrspace(1)* noundef nonnull align 4 dereferenceable(16) %p1, i64 16, i1 false), !alias.scope !2, !noalias !4			tail call void @llvm.memcpy.p1i8.p1i8.i64(i8 addrspace(1)* noundef nonnull align 4 dereferenceable(16) %p0, i8 addrspace(1)* noundef nonnull align 4 dereferenceable(16) %p1, i64 16, i1 false), !alias.scope !2, !noalias !4
	%v0 = load i32, i32 addrspace(1)* %q, align 4, !alias.scope !4, !noalias !2			%v0 = load i32, i32 addrspace(1)* %q, align 4, !alias.scope !4, !noalias !2
	%q1 = getelementptr inbounds i32, i32 addrspace(1)* %q, i64 1			%q1 = getelementptr inbounds i32, i32 addrspace(1)* %q, i64 1
	%v1 = load i32, i32 addrspace(1)* %q1, align 4, !alias.scope !4, !noalias !2			%v1 = load i32, i32 addrspace(1)* %q1, align 4, !alias.scope !4, !noalias !2
	%add = add i32 %v0, %v1			%add = add i32 %v0, %v1
	ret i32 %add			ret i32 %add
	}			}

	; MIR-LABEL: name: test_memcpy_inline			; MIR-LABEL: name: test_memcpy_inline
	; MIR: %8:vreg_128 = GLOBAL_LOAD_DWORDX4 %9, 16, 0, implicit $exec :: (load 16 from %ir.p1, align 4, !alias.scope ![[SET0]], !noalias ![[SET1]], addrspace 1)			; MIR: machineMetadataNodes:
	; MIR: GLOBAL_STORE_DWORDX4 %10, killed %8, 0, 0, implicit $exec :: (store 16 into %ir.p0, align 4, !alias.scope ![[SET0]], !noalias ![[SET1]], addrspace 1)			; MIR-DAG: ![[MMDOMAIN:[0-9]+]] = distinct !{!{{[0-9]+}}, !"MemcpyLoweringDomain"}
				; MIR-DAG: ![[MMSCOPE0:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[MMDOMAIN]], !"Src"}
				; MIR-DAG: ![[MMSCOPE1:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[MMDOMAIN]], !"Dst"}
				; MIR-DAG: ![[MMSET0:[0-9]+]] = !{![[SCOPE0]], ![[MMSCOPE0]]}
				; MIR-DAG: ![[MMSET1:[0-9]+]] = !{![[SCOPE0]], ![[MMSCOPE1]]}
				; MIR-DAG: ![[MMSET2:[0-9]+]] = !{![[SCOPE1]], ![[MMSCOPE1]]}
				; MIR-DAG: ![[MMSET3:[0-9]+]] = !{![[SCOPE1]], ![[MMSCOPE0]]}
				; MIR: body:
				; MIR: %8:vreg_128 = GLOBAL_LOAD_DWORDX4 %9, 16, 0, implicit $exec :: (load 16 from %ir.p1, align 4, !alias.scope ![[MMSET0]], !noalias ![[MMSET2]], addrspace 1)
				; MIR: GLOBAL_STORE_DWORDX4 %10, killed %8, 0, 0, implicit $exec :: (store 16 into %ir.p0, align 4, !alias.scope ![[MMSET1]], !noalias ![[MMSET3]], addrspace 1)
	define i32 @test_memcpy_inline(i32 addrspace(1)* nocapture %p, i32 addrspace(1)* nocapture readonly %q) {			define i32 @test_memcpy_inline(i32 addrspace(1)* nocapture %p, i32 addrspace(1)* nocapture readonly %q) {
	; Check loads of %q are scheduled ahead of that store of the memcpy on %p.			; Check loads of %q are scheduled ahead of that store of the memcpy on %p.
	; CHECK-LABEL: test_memcpy_inline:			; CHECK-LABEL: test_memcpy_inline:
	; CHECK-DAG: global_load_dwordx2 v{{\[}}[[Q0:[0-9]+]]:[[Q1:[0-9]+]]{{\]}}, v[2:3], off			; CHECK-DAG: global_load_dwordx2 v{{\[}}[[Q0:[0-9]+]]:[[Q1:[0-9]+]]{{\]}}, v[2:3], off
	; CHECK-DAG: global_load_dwordx4 [[PVAL:v\[[0-9]+:[0-9]+\]]], v[0:1], off offset:16			; CHECK-DAG: global_load_dwordx4 [[PVAL:v\[[0-9]+:[0-9]+\]]], v[0:1], off offset:16
	; CHECK-DAG: v_add_nc_u32_e32 v{{[0-9]+}}, v[[Q0]], v[[Q1]]			; CHECK-DAG: v_add_nc_u32_e32 v{{[0-9]+}}, v[[Q0]], v[[Q1]]
	; CHECK: global_store_dwordx4 v[0:1], [[PVAL]], off			; CHECK: global_store_dwordx4 v[0:1], [[PVAL]], off
	; CHECK: s_setpc_b64 s[30:31]			; CHECK: s_setpc_b64 s[30:31]
	▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/pr45301.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=powerpc64-- -verify-machineinstrs \			; RUN: llc -mtriple=powerpc64-- -verify-machineinstrs \
	; RUN: -ppc-asm-full-reg-names < %s \| FileCheck %s			; RUN: -ppc-asm-full-reg-names < %s \| FileCheck %s
	%struct.e.0.1.2.3.12.29 = type { [10 x i32] }			%struct.e.0.1.2.3.12.29 = type { [10 x i32] }

	define dso_local void @g(%struct.e.0.1.2.3.12.29* %agg.result) local_unnamed_addr #0 {			define dso_local void @g(%struct.e.0.1.2.3.12.29* %agg.result) local_unnamed_addr #0 {
	; CHECK-LABEL: g:			; CHECK-LABEL: g:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: mflr r0			; CHECK-NEXT: mflr r0
	; CHECK-NEXT: std r0, 16(r1)			; CHECK-NEXT: std r0, 16(r1)
	; CHECK-NEXT: stdu r1, -112(r1)			; CHECK-NEXT: stdu r1, -112(r1)
	; CHECK-NEXT: bl i			; CHECK-NEXT: bl i
	; CHECK-NEXT: nop			; CHECK-NEXT: nop
	; CHECK-NEXT: addis r4, r2, g@toc@ha			; CHECK-NEXT: addis r4, r2, g@toc@ha
	; CHECK-NEXT: addi r4, r4, g@toc@l			; CHECK-NEXT: addi r4, r4, g@toc@l
	; CHECK-NEXT: ld r5, 0(r4)			; CHECK-NEXT: ld r5, 0(r4)
	; CHECK-NEXT: std r5, 0(r3)			; CHECK-NEXT: ld r6, 16(r4)
	; CHECK-NEXT: ld r5, 16(r4)			; CHECK-NEXT: ld r7, 8(r4)
	; CHECK-NEXT: std r5, 16(r3)			; CHECK-NEXT: ld r8, 24(r4)
	; CHECK-NEXT: ld r6, 8(r4)
	; CHECK-NEXT: std r6, 8(r3)
	; CHECK-NEXT: ld r6, 24(r4)
	; CHECK-NEXT: std r6, 24(r3)
	; CHECK-NEXT: lwz r6, 0(r3)
	; CHECK-NEXT: ld r4, 32(r4)			; CHECK-NEXT: ld r4, 32(r4)
				; CHECK-NEXT: std r5, 0(r3)
	; CHECK-NEXT: std r4, 32(r3)			; CHECK-NEXT: std r4, 32(r3)
	; CHECK-NEXT: li r4, 20			; CHECK-NEXT: li r4, 20
	; CHECK-NEXT: stwbrx r6, 0, r3			; CHECK-NEXT: lwz r5, 0(r3)
	; CHECK-NEXT: stwbrx r5, r3, r4			; CHECK-NEXT: std r7, 8(r3)
				; CHECK-NEXT: std r8, 24(r3)
				; CHECK-NEXT: std r6, 16(r3)
				; CHECK-NEXT: stwbrx r5, 0, r3
				; CHECK-NEXT: stwbrx r6, r3, r4
				hliaoAuthorUnsubmitted Done Reply Inline Actions The same from the functionality wise but loads from that memcpy are scheduled ahead in the favor of latency hiding. hliao: The same from the functionality wise but loads from that memcpy are scheduled ahead in the…
	; CHECK-NEXT: addi r1, r1, 112			; CHECK-NEXT: addi r1, r1, 112
	; CHECK-NEXT: ld r0, 16(r1)			; CHECK-NEXT: ld r0, 16(r1)
	; CHECK-NEXT: mtlr r0			; CHECK-NEXT: mtlr r0
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	entry:			entry:
	%call = tail call signext i32 bitcast (i32 (...)* @i to i32 ()*)()			%call = tail call signext i32 bitcast (i32 (...)* @i to i32 ()*)()
	%conv = sext i32 %call to i64			%conv = sext i32 %call to i64
	%0 = inttoptr i64 %conv to i8*			%0 = inttoptr i64 %conv to i8*
	Show All 21 Lines

llvm/test/CodeGen/X86/memcpy-scoped-aa.ll

; RUN: llc -mtriple=x86_64-linux-gnu -stop-after=finalize-isel -o - %s \| FileCheck --check-prefix=MIR %s		; RUN: llc -mtriple=x86_64-linux-gnu -stop-after=finalize-isel -o - %s \| FileCheck --check-prefix=MIR %s

; Ensure that the scoped AA is attached on loads/stores lowered from mem ops.		; Ensure that the scoped AA is attached on loads/stores lowered from mem ops.

; Re-evaluate the slot numbers of scopes as that numbering could be changed run-by-run.		; Re-evaluate the slot numbers of scopes as that numbering could be changed run-by-run.

; MIR-DAG: ![[DOMAIN:[0-9]+]] = distinct !{!{{[0-9]+}}, !"bax"}		; MIR-DAG: ![[DOMAIN:[0-9]+]] = distinct !{!{{[0-9]+}}, !"bax"}
; MIR-DAG: ![[SCOPE0:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[DOMAIN]], !"bax: %p"}		; MIR-DAG: ![[SCOPE0:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[DOMAIN]], !"bax: %p"}
; MIR-DAG: ![[SCOPE1:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[DOMAIN]], !"bax: %q"}		; MIR-DAG: ![[SCOPE1:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[DOMAIN]], !"bax: %q"}
; MIR-DAG: ![[SET0:[0-9]+]] = !{![[SCOPE0]]}		; MIR-DAG: ![[SET0:[0-9]+]] = !{![[SCOPE0]]}
; MIR-DAG: ![[SET1:[0-9]+]] = !{![[SCOPE1]]}		; MIR-DAG: ![[SET1:[0-9]+]] = !{![[SCOPE1]]}

; MIR-LABEL: name: test_memcpy		; MIR-LABEL: name: test_memcpy
; MIR: %2:gr64 = MOV64rm %0, 1, $noreg, 16, $noreg :: (load 8 from %ir.p1, align 4, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR: machineMetadataNodes:
; MIR-NEXT: %3:gr64 = MOV64rm %0, 1, $noreg, 24, $noreg :: (load 8 from %ir.p1 + 8, align 4, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR-DAG: ![[MMDOMAIN:[0-9]+]] = distinct !{!{{[0-9]+}}, !"MemcpyLoweringDomain"}
; MIR-NEXT: MOV64mr %0, 1, $noreg, 8, $noreg, killed %3 :: (store 8 into %ir.p0 + 8, align 4, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR-DAG: ![[MMSCOPE0:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[MMDOMAIN]], !"Src"}
; MIR-NEXT: MOV64mr %0, 1, $noreg, 0, $noreg, killed %2 :: (store 8 into %ir.p0, align 4, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR-DAG: ![[MMSCOPE1:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[MMDOMAIN]], !"Dst"}
		; MIR-DAG: ![[MMSET0:[0-9]+]] = !{![[SCOPE0]], ![[MMSCOPE0]]}
		; MIR-DAG: ![[MMSET1:[0-9]+]] = !{![[SCOPE0]], ![[MMSCOPE1]]}
		; MIR-DAG: ![[MMSET2:[0-9]+]] = !{![[SCOPE1]], ![[MMSCOPE1]]}
		; MIR-DAG: ![[MMSET3:[0-9]+]] = !{![[SCOPE1]], ![[MMSCOPE0]]}
		; MIR: body:
		; MIR: %2:gr64 = MOV64rm %0, 1, $noreg, 16, $noreg :: (load 8 from %ir.p1, align 4, !alias.scope ![[MMSET0]], !noalias ![[MMSET2]])
		; MIR-NEXT: %3:gr64 = MOV64rm %0, 1, $noreg, 24, $noreg :: (load 8 from %ir.p1 + 8, align 4, !alias.scope ![[MMSET0]], !noalias ![[MMSET2]])
		; MIR-NEXT: MOV64mr %0, 1, $noreg, 8, $noreg, killed %3 :: (store 8 into %ir.p0 + 8, align 4, !alias.scope ![[MMSET1]], !noalias ![[MMSET3]])
		; MIR-NEXT: MOV64mr %0, 1, $noreg, 0, $noreg, killed %2 :: (store 8 into %ir.p0, align 4, !alias.scope ![[MMSET1]], !noalias ![[MMSET3]])
define i32 @test_memcpy(i32* nocapture %p, i32* nocapture readonly %q) {		define i32 @test_memcpy(i32* nocapture %p, i32* nocapture readonly %q) {
%p0 = bitcast i32* %p to i8*		%p0 = bitcast i32* %p to i8*
%add.ptr = getelementptr inbounds i32, i32* %p, i64 4		%add.ptr = getelementptr inbounds i32, i32* %p, i64 4
%p1 = bitcast i32* %add.ptr to i8*		%p1 = bitcast i32* %add.ptr to i8*
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(16) %p0, i8* noundef nonnull align 4 dereferenceable(16) %p1, i64 16, i1 false), !alias.scope !2, !noalias !4		tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(16) %p0, i8* noundef nonnull align 4 dereferenceable(16) %p1, i64 16, i1 false), !alias.scope !2, !noalias !4
%v0 = load i32, i32* %q, align 4, !alias.scope !4, !noalias !2		%v0 = load i32, i32* %q, align 4, !alias.scope !4, !noalias !2
%q1 = getelementptr inbounds i32, i32* %q, i64 1		%q1 = getelementptr inbounds i32, i32* %q, i64 1
%v1 = load i32, i32* %q1, align 4, !alias.scope !4, !noalias !2		%v1 = load i32, i32* %q1, align 4, !alias.scope !4, !noalias !2
%add = add i32 %v0, %v1		%add = add i32 %v0, %v1
ret i32 %add		ret i32 %add
}		}

; MIR-LABEL: name: test_memcpy_inline		; MIR-LABEL: name: test_memcpy_inline
; MIR: %2:gr64 = MOV64rm %0, 1, $noreg, 16, $noreg :: (load 8 from %ir.p1, align 4, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR: machineMetadataNodes:
; MIR-NEXT: %3:gr64 = MOV64rm %0, 1, $noreg, 24, $noreg :: (load 8 from %ir.p1 + 8, align 4, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR-DAG: ![[MMDOMAIN:[0-9]+]] = distinct !{!{{[0-9]+}}, !"MemcpyLoweringDomain"}
; MIR-NEXT: MOV64mr %0, 1, $noreg, 8, $noreg, killed %3 :: (store 8 into %ir.p0 + 8, align 4, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR-DAG: ![[MMSCOPE0:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[MMDOMAIN]], !"Src"}
; MIR-NEXT: MOV64mr %0, 1, $noreg, 0, $noreg, killed %2 :: (store 8 into %ir.p0, align 4, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR-DAG: ![[MMSCOPE1:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[MMDOMAIN]], !"Dst"}
		; MIR-DAG: ![[MMSET0:[0-9]+]] = !{![[SCOPE0]], ![[MMSCOPE0]]}
		; MIR-DAG: ![[MMSET1:[0-9]+]] = !{![[SCOPE0]], ![[MMSCOPE1]]}
		; MIR-DAG: ![[MMSET2:[0-9]+]] = !{![[SCOPE1]], ![[MMSCOPE1]]}
		; MIR-DAG: ![[MMSET3:[0-9]+]] = !{![[SCOPE1]], ![[MMSCOPE0]]}
		; MIR: body:
		; MIR: %2:gr64 = MOV64rm %0, 1, $noreg, 16, $noreg :: (load 8 from %ir.p1, align 4, !alias.scope ![[MMSET0]], !noalias ![[MMSET2]])
		; MIR-NEXT: %3:gr64 = MOV64rm %0, 1, $noreg, 24, $noreg :: (load 8 from %ir.p1 + 8, align 4, !alias.scope ![[MMSET0]], !noalias ![[MMSET2]])
		; MIR-NEXT: MOV64mr %0, 1, $noreg, 8, $noreg, killed %3 :: (store 8 into %ir.p0 + 8, align 4, !alias.scope ![[MMSET1]], !noalias ![[MMSET3]])
		; MIR-NEXT: MOV64mr %0, 1, $noreg, 0, $noreg, killed %2 :: (store 8 into %ir.p0, align 4, !alias.scope ![[MMSET1]], !noalias ![[MMSET3]])
define i32 @test_memcpy_inline(i32* nocapture %p, i32* nocapture readonly %q) {		define i32 @test_memcpy_inline(i32* nocapture %p, i32* nocapture readonly %q) {
%p0 = bitcast i32* %p to i8*		%p0 = bitcast i32* %p to i8*
%add.ptr = getelementptr inbounds i32, i32* %p, i64 4		%add.ptr = getelementptr inbounds i32, i32* %p, i64 4
%p1 = bitcast i32* %add.ptr to i8*		%p1 = bitcast i32* %add.ptr to i8*
tail call void @llvm.memcpy.inline.p0i8.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(16) %p0, i8* noundef nonnull align 4 dereferenceable(16) %p1, i64 16, i1 false), !alias.scope !2, !noalias !4		tail call void @llvm.memcpy.inline.p0i8.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(16) %p0, i8* noundef nonnull align 4 dereferenceable(16) %p1, i64 16, i1 false), !alias.scope !2, !noalias !4
%v0 = load i32, i32* %q, align 4, !alias.scope !4, !noalias !2		%v0 = load i32, i32* %q, align 4, !alias.scope !4, !noalias !2
%q1 = getelementptr inbounds i32, i32* %q, i64 1		%q1 = getelementptr inbounds i32, i32* %q, i64 1
%v1 = load i32, i32* %q1, align 4, !alias.scope !4, !noalias !2		%v1 = load i32, i32* %q1, align 4, !alias.scope !4, !noalias !2
Show All 28 Lines	define i32 @test_memset(i32* nocapture %p, i32* nocapture readonly %q) {
%v0 = load i32, i32* %q, align 4, !alias.scope !4, !noalias !2		%v0 = load i32, i32* %q, align 4, !alias.scope !4, !noalias !2
%q1 = getelementptr inbounds i32, i32* %q, i64 1		%q1 = getelementptr inbounds i32, i32* %q, i64 1
%v1 = load i32, i32* %q1, align 4, !alias.scope !4, !noalias !2		%v1 = load i32, i32* %q1, align 4, !alias.scope !4, !noalias !2
%add = add i32 %v0, %v1		%add = add i32 %v0, %v1
ret i32 %add		ret i32 %add
}		}

; MIR-LABEL: name: test_mempcpy		; MIR-LABEL: name: test_mempcpy
; MIR: %2:gr64 = MOV64rm %0, 1, $noreg, 16, $noreg :: (load 8 from %ir.p1, align 1, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR-DAG: ![[MMDOMAIN:[0-9]+]] = distinct !{!{{[0-9]+}}, !"MemcpyLoweringDomain"}
; MIR-NEXT: %3:gr64 = MOV64rm %0, 1, $noreg, 24, $noreg :: (load 8 from %ir.p1 + 8, align 1, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR-DAG: ![[MMSCOPE0:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[MMDOMAIN]], !"Src"}
; MIR-NEXT: MOV64mr %0, 1, $noreg, 8, $noreg, killed %3 :: (store 8 into %ir.p0 + 8, align 1, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR-DAG: ![[MMSCOPE1:[0-9]+]] = distinct !{!{{[0-9]+}}, ![[MMDOMAIN]], !"Dst"}
; MIR-NEXT: MOV64mr %0, 1, $noreg, 0, $noreg, killed %2 :: (store 8 into %ir.p0, align 1, !alias.scope ![[SET0]], !noalias ![[SET1]])		; MIR-DAG: ![[MMSET0:[0-9]+]] = !{![[SCOPE0]], ![[MMSCOPE0]]}
		; MIR-DAG: ![[MMSET1:[0-9]+]] = !{![[SCOPE0]], ![[MMSCOPE1]]}
		; MIR-DAG: ![[MMSET2:[0-9]+]] = !{![[SCOPE1]], ![[MMSCOPE1]]}
		; MIR-DAG: ![[MMSET3:[0-9]+]] = !{![[SCOPE1]], ![[MMSCOPE0]]}
		; MIR: body:
		; MIR: %2:gr64 = MOV64rm %0, 1, $noreg, 16, $noreg :: (load 8 from %ir.p1, align 1, !alias.scope ![[MMSET0]], !noalias ![[MMSET2]])
		; MIR-NEXT: %3:gr64 = MOV64rm %0, 1, $noreg, 24, $noreg :: (load 8 from %ir.p1 + 8, align 1, !alias.scope ![[MMSET0]], !noalias ![[MMSET2]])
		; MIR-NEXT: MOV64mr %0, 1, $noreg, 8, $noreg, killed %3 :: (store 8 into %ir.p0 + 8, align 1, !alias.scope ![[MMSET1]], !noalias ![[MMSET3]])
		; MIR-NEXT: MOV64mr %0, 1, $noreg, 0, $noreg, killed %2 :: (store 8 into %ir.p0, align 1, !alias.scope ![[MMSET1]], !noalias ![[MMSET3]])
define i32 @test_mempcpy(i32* nocapture %p, i32* nocapture readonly %q) {		define i32 @test_mempcpy(i32* nocapture %p, i32* nocapture readonly %q) {
%p0 = bitcast i32* %p to i8*		%p0 = bitcast i32* %p to i8*
%add.ptr = getelementptr inbounds i32, i32* %p, i64 4		%add.ptr = getelementptr inbounds i32, i32* %p, i64 4
%p1 = bitcast i32* %add.ptr to i8*		%p1 = bitcast i32* %add.ptr to i8*
%call = tail call i8* @mempcpy(i8* noundef nonnull align 4 dereferenceable(16) %p0, i8* noundef nonnull align 4 dereferenceable(16) %p1, i64 16), !alias.scope !2, !noalias !4		%call = tail call i8* @mempcpy(i8* noundef nonnull align 4 dereferenceable(16) %p0, i8* noundef nonnull align 4 dereferenceable(16) %p1, i64 16), !alias.scope !2, !noalias !4
%v0 = load i32, i32* %q, align 4, !alias.scope !4, !noalias !2		%v0 = load i32, i32* %q, align 4, !alias.scope !4, !noalias !2
%q1 = getelementptr inbounds i32, i32* %q, i64 1		%q1 = getelementptr inbounds i32, i32* %q, i64 1
%v1 = load i32, i32* %q1, align 4, !alias.scope !4, !noalias !2		%v1 = load i32, i32* %q1, align 4, !alias.scope !4, !noalias !2
Show All 16 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Generate scoped AA metadata when lowering memcpy.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 355096

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/test/CodeGen/AArch64/arm64-2012-05-07-MemcpyAlignBug.ll

llvm/test/CodeGen/AArch64/arm64-memcpy-inline.ll

llvm/test/CodeGen/AArch64/arm64-misaligned-memcpy-inline.ll

llvm/test/CodeGen/AArch64/memcpy-scoped-aa.ll

llvm/test/CodeGen/AMDGPU/memcpy-scoped-aa.ll

llvm/test/CodeGen/PowerPC/pr45301.ll

llvm/test/CodeGen/X86/memcpy-scoped-aa.ll

[SelectionDAG] Generate scoped AA metadata when lowering memcpy.
Needs ReviewPublic