This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
3/6
TargetLowering.h
-
lib/
-
CodeGen/
-
CodeGenPrepare.cpp
3/7
TargetLoweringBase.cpp
-
Target/ARM/
-
ARM/
-
ARMISelLowering.h
1/2
ARMISelLowering.cpp
-
test/
-
CodeGen/
-
RISCV/
-
memcpy-inline.ll
-
WebAssembly/
-
bulk-memory.ll
-
bulk-memory64.ll
-
X86/GlobalISel/
-
GlobalISel/
-
x86_64-irtranslator-struct-return.ll
-
Transforms/CodeGenPrepare/RISCV/
-
CodeGenPrepare/
-
RISCV/
1/1
adjust-memintrin-alignment.ll

Differential D134282

[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation
AcceptedPublic

Authored by arichardson on Sep 20 2022, 6:23 AM.

Download Raw Diff

Details

Reviewers

craig.topper
john.brawn
hfinkel
rengolin
JojoR
jrtc27
efriedma

Commits

rGbd87a2449da0: [CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation

Summary

This function was added for ARM targets, but aligning global/stack pointer
arguments passed to memcpy/memmove/memset can improve code size and
performance for all targets that don't have fast unaligned accesses.
This adds a generic implementation that adjusts the alignment to pointer
size if unaligned accesses are slow.
Review D134168 suggests that this significantly improves performance on
synthetic benchmarks such as Dhrystone on RV32 as it avoids memcpy() calls.

TODO: It should also improve performance for other benchmarks, would be
good to get some numbers.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

arichardson created this revision.Sep 20 2022, 6:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 20 2022, 6:23 AM

Herald added subscribers: pmatos, asb, StephenFan and 25 others. · View Herald Transcript

arichardson requested review of this revision.Sep 20 2022, 6:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 20 2022, 6:23 AM

Herald added subscribers: llvm-commits, • pcwang-thead, MaskRay, aheejin. · View Herald Transcript

arichardson added a parent revision: D134281: [CGP] Update MemIntrinsic alignment if possible.Sep 20 2022, 6:23 AM

Harbormaster completed remote builds in B187744: Diff 461556.Sep 20 2022, 6:24 AM

arichardson mentioned this in D134168: [RISCV] Make preferred alignment of PointerArgs for MemIntrinsic.Sep 20 2022, 6:24 AM

arichardson added inline comments.

llvm/test/Transforms/CodeGenPrepare/RISCV/adjust-memintrin-alignment.ll
3	Not sure if there is a way to get the default target datalayout from opt, so I've hardcoded the relevant bits here.

craig.topper added a reviewer: efriedma.Sep 20 2022, 12:21 PM

efriedma added inline comments.Sep 20 2022, 2:05 PM

llvm/lib/CodeGen/TargetLoweringBase.cpp
967	The question is what size load/store ops would we prefer to use for the memcpy, and whether those ops require alignment. Using the alignment of a pointer seems arbitrary; we aren't loading or storing a pointer. I'd prefer not to call getPointerAlignment() here if we can avoid it; the caller already does the math to figure out the current alignment and the increase. shouldUpdatePointerArgAlignment just needs to know what alignment it wants, not whether the call currently satisfies that alignment.
llvm/lib/Target/ARM/ARMISelLowering.cpp
1927	You want to make this more aggressive by default? Maybe... but we probably want different heuristics for small copies. (For example, aligning a 3-byte copy to 8 bytes makes no sense; we can't take advantage of alignment greater than 2 bytes.)

JojoR added inline comments.Sep 20 2022, 8:24 PM

llvm/lib/CodeGen/TargetLoweringBase.cpp
967	As @efriedma said, we can set default PrefAlign according to PointerSize, but we should return final PrefAlign from backend du to backend's requirement, different ISAs maybe have different alignment :)

@arichardson Hello ?

Address feedback

Harbormaster completed remote builds in B189134: Diff 463499.Sep 28 2022, 4:08 AM

arichardson added inline comments.Sep 28 2022, 4:08 AM

llvm/lib/CodeGen/TargetLoweringBase.cpp
967	I agree that this is a backend-specific choice. I would assume loading an aligned pointer is an efficient operation on most/all targets, and for the ones where this is not true, they can override `shouldUpdatePointerArgAlignment()`. I believe this change should match now what you did for RISC-V: MinSize==XLEN PrefAlign==XLEN.
llvm/lib/Target/ARM/ARMISelLowering.cpp
1927	I've reverted this part of the diff and added test to show we don't adjust 3/7 byte objects

arichardson added inline comments.Sep 28 2022, 4:11 AM

llvm/lib/CodeGen/TargetLoweringBase.cpp
966	@jrtc27 we may want to adjust these values for CHERI to not require 16-byte alignment&size, but I think even without a 8-byte fallback this should be a (minor) net win.

efriedma added inline comments.Sep 28 2022, 10:55 AM

llvm/include/llvm/CodeGen/TargetLowering.h
1933	It looks like the argument "Arg" is now unused?

Fix unused argument

Harbormaster completed remote builds in B190169: Diff 464960.Oct 4 2022, 4:50 AM

efriedma added inline comments.Oct 4 2022, 12:31 PM

llvm/include/llvm/CodeGen/TargetLowering.h
1933	Still here? Did you mean to upload a different change?

arichardson added inline comments.Oct 4 2022, 12:36 PM

llvm/include/llvm/CodeGen/TargetLowering.h
1933	I was using it for allowsMisalignedMemoryAccesses() alignment, but dropped it in the previous diff. I think having the current argument that is being processed could be useful for overrides (since they could make decisions based on the current alignment). I've added the use back now, but am also happy to drop it.
1933	Arg is used in the call to allowsMisalignedMemoryAccesses(getPointerMemTy() to determine if it's already a fast operation. I can drop it if you prefer, I just thought it is potentially useful to avoid additional aligning.

efriedma mentioned this in D135462: [SelectionDAG] Do not second-guess alignment for alloca.Oct 7 2022, 11:12 AM

efriedma added inline comments.Oct 7 2022, 11:47 AM

llvm/include/llvm/CodeGen/TargetLowering.h
1933	I'm not sure how calling getPointerAlignment() avoids additional alignment, assuming PrefAlign is correct. shouldAlignPointerArgs is supposed to return the minimum "fast" alignment for the given call. If the actual alignment is already greater than or equal to that, the caller does nothing, even if shouldAlignPointerArgs returns true. (If there's some alignment between 1 and PrefAlign that the target considers "fast", I guess you could run into an issue, but that implies PrefAlign is wrong.)
llvm/lib/CodeGen/TargetLoweringBase.cpp
967	Maybe TargetTransformInfo::getRegisterBitWidth() is a better default than getPointerPrefAlignment()? I guess that's the same thing for most targets, but it probably makes the intent a bit more clear... I agree there isn't any default that's going to be correct for all targets.

arichardson planned changes to this revision.Oct 7 2022, 12:08 PM

arichardson added inline comments.

llvm/include/llvm/CodeGen/TargetLowering.h
1933	Fair enough, I'll drop the argument and update the patch to avoid looking at the current alignment.
llvm/lib/CodeGen/TargetLoweringBase.cpp
967	I was not aware of that function. Thanks for pointing it out that does indeed sounds like the best default.

Address review feedback

Harbormaster completed remote builds in B198012: Diff 475839.Nov 16 2022, 8:48 AM

efriedma added inline comments.Nov 16 2022, 4:07 PM

llvm/lib/CodeGen/TargetLoweringBase.cpp
967	Did you mean to address this?

Address missed feedback - adds two more affected tests due to pointer/scalar register size differences

Harbormaster completed remote builds in B198168: Diff 476070.Nov 17 2022, 3:18 AM

rebase after bcaf31ec3fc8fea566e9a1464b47b73ab7a28621

Harbormaster completed remote builds in B200133: Diff 478750.Nov 29 2022, 5:19 PM

Rebase, fix unsused variable.

Harbormaster completed remote builds in B205426: Diff 485944.Jan 3 2023, 3:17 AM

LGTM

This revision is now accepted and ready to land.Feb 7 2023, 6:14 PM

Herald added subscribers: jobnoorman, luke. · View Herald TranscriptFeb 7 2023, 6:14 PM

(I apologize for the delay here. I meant to get to this earlier.)

Closed by commit rGbd87a2449da0: [CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation (authored by arichardson). · Explain WhyFeb 9 2023, 2:14 AM

This revision was automatically updated to reflect the committed changes.

arichardson added a commit: rGbd87a2449da0: [CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation.

This or 4c72266830ffa332ebb7cf1d3bbd6c56d001fa0f seems to cause a test-suite build failure on AIX, could you take a look?
https://lab.llvm.org/buildbot/#/builders/214/builds/5779/steps/9/logs/stdio

Jake-Egan added a reverting change: rG08533f8b8660: Revert "[CGP] Add generic TargetLowering::shouldAlignPointerArgs()….Feb 14 2023, 12:20 PM

That error is probably going to be hard for anyone on an non-AIX machine to reproduce; a reduced testcase would be helpful.

This runs into the error The symbol .rodata.str1.1L...str is not defined. if we specify an optimization flag greater than 0.

clang++ -O1 -c foo.cc

#include <string>
namespace benchmark {
class State {
public:
  void SkipWithError(char *);
};
} // benchmark
struct a {
  std::string b;
};
int c(char *, std::initializer_list< a >);
void e(benchmark::State f) { f.SkipWithError("error message"); }
int g = c("", {{"error message"}});

If it's too difficult to debug this test case without an AIX machine please let me know

We need a series of commands that could be run on a non-AIX machine to identify the problem. So explicitly specify the correct triple, and don't depend on external headers (you can use preprocessed source if necessary).

It fails with a different assembler error on different platforms (tried on AIX, linux, and mac). No error if you remove the AIX target.

clang++ -target powerpc64-ibm-aix -c foo.cc

namespace {
template <class a, class d> void aa(a, d);
template <class> struct e;
class g;
template <class ac, class = ac, class = g> class h;
template <class ad, class f> void i(ad *j, ad *k, f l) {
  long a(k - j);
  __builtin_memmove(l, j, a);
  aa(j, l);
}
template <class af, class ag, class ah> void ai(af j, ag k, ah l) {
  i(j, k, l);
}
template <class aj, class ak> void al(aj j, aj k, ak l) { ai(j, k, l); }
template <class aj, class am, class ak> ak an(aj j, am k, ak l) {
  al(j, j + k, l);
}
template <> struct e<char> {
  static char m(char *j, const char *k, long l) { an(k, l, j); }
};
template <class> struct as {
  as(int);
};
template <class d> class n : as<int>, as<d> {
public:
  using au = as;
  using av = as<d>;
  template <class aw, class ax> n(aw, ax) : au(0), av(0) {}
};
template <class, class, class> class h {
  n<g> ay;

public:
  h(char *j) : ay(int(), int()) {
    long b;
    o(j, b);
  }
  void o(const char *, int);
};
template <class ac, class bc, class bd>
void h<ac, bc, bd>::o(const char *j, int k) {
  e<char>::m(0, j, k);
}
} // namespace
struct {
  h<int> b;
} c{"error message"};

In D134282#4141617, @Jake-Egan wrote:
It fails with a different assembler error on different platforms (tried on AIX, linux, and mac). No error if you remove the AIX target.
clang++ -target powerpc64-ibm-aix -c foo.cc

@Jake-Egan,

I had to add -O before I got the bad assembly:

clang++ -target powerpc64-ibm-aix -O -S -o - foo.cc

The bad assembly does not have the 1-byte element/1-byte alignment read-only string section defined, but it has the TOC entry for it:

        .toc
L..C0:
        .tc .rodata.str1.1L...str[TC],.rodata.str1.1L...str[RO]
L..C1:
        .tc .rodata.str1.8L...str[TC],.rodata.str1.8L...str[RO]

Here's an IR only example which will exhibit the invalid AIX assembly after this change:

target datalayout = "E-m:a-i64:64-n32:64-S128-v256:256:256-v512:512:512"
target triple = "powerpc64-ibm-aix"

@.str = private unnamed_addr constant [14 x i8] c"error message\00"

define internal fastcc void @_ZN12_GLOBAL__N_12anIPKclPcEET1_T_T0_S4_() {
entry:
  store i64 0, ptr @.str, align 8
  unreachable
}

define internal fastcc void @_ZN12_GLOBAL__N_12alIPKcPcEEvT_S4_T0_(i64 %0) {
entry:
  tail call void @llvm.memcpy.p0.p0.i64(ptr null, ptr @.str, i64 %0, i1 false)
  ret void
}

; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: readwrite)
declare void @llvm.memcpy.p0.p0.i64(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { nocallback nofree nounwind willreturn memory(argmem: readwrite) }

We see the same TOC reference without a def:

.csect .rodata.str1.8L...str[RO],2
.tc .rodata.str1.1L...str[TC],.rodata.str1.1L...str[RO]
.tc .rodata.str1.8L...str[TC],.rodata.str1.8L...str[RO]

The problem seems to lie in the fact that we codegen the first function with no explicit alignment for @.str resulting in TOC references to the .rodata.str1.1L section , but after this change when doing CodeGenPrepare we modify the alignment on @.str to 8, which changes the MCSection (i.e. csect) name for the final object we emit.

Is there any way we can make AIX targets more resilient handling this kind of alignment change? If we have to, we can choose a point after which passes aren't allowed to increase the alignment of globals, and move this transform before that. But I'd like to avoid that if possible.

This revision is now accepted and ready to land.Feb 23 2023, 12:20 PM

In D134282#4148337, @efriedma wrote:

Is there any way we can make AIX targets more resilient handling this kind of alignment change? If we have to, we can choose a point after which passes aren't allowed to increase the alignment of globals, and move this transform before that. But I'd like to avoid that if possible.

Not that I know of without negative side-effects. For example, if we access using a TOC entry for a label for the individual string, we would end up with more TOC entries which, in turn, can lead to TOC overflow.

As info, the object generation with this patch leads to:

llvm-project/llvm/lib/MC/XCOFFObjectWriter.cpp:620: virtual void (anonymous namespace)::XCOFFObjectWriter::recordRelocation(llvm::MCAssembler &, const llvm::MCAsmLayout &, const llvm::MCFragment *, const llvm::MCFixup &, llvm::MCValue, uint64_t &): Assertion `SectionMap.find(SymASec) != SectionMap.end() && "Expected containing csect to exist in map."' failed.

In D134282#4148884, @hubert.reinterpretcast wrote:

In D134282#4148337, @efriedma wrote:

Is there any way we can make AIX targets more resilient handling this kind of alignment change? If we have to, we can choose a point after which passes aren't allowed to increase the alignment of globals, and move this transform before that. But I'd like to avoid that if possible.

Not that I know of without negative side-effects. For example, if we access using a TOC entry for a label for the individual string, we would end up with more TOC entries which, in turn, can lead to TOC overflow.

I'm wondering if there is really a need to split these rodata csect by alignment the way we do for XCOFF. This behaviour seems to have it's origin in the ELF mergable string handling, and we don't have the same linker features.

I have a draft of such a change, that when combined with this patch, seems to resolve the issue we are seeing. I'll do some more evaluation offline to see if this is a viable resolution and report back.

In D134282#4148884, @hubert.reinterpretcast wrote:

Not that I know of without negative side-effects. For example, if we access using a TOC entry for a label for the individual string, we would end up with more TOC entries which, in turn, can lead to TOC overflow.

Looking at a case with more than one string, it seems we currently have separate csects and TOC entries for each string anyway. I am not sure that is a state we want to stay in though. I do not want to make changing that more difficult.

In D134282#4151773, @hubert.reinterpretcast wrote:

In D134282#4148884, @hubert.reinterpretcast wrote:

Not that I know of without negative side-effects. For example, if we access using a TOC entry for a label for the individual string, we would end up with more TOC entries which, in turn, can lead to TOC overflow.

Looking at a case with more than one string, it seems we currently have separate csects and TOC entries for each string anyway. I am not sure that is a state we want to stay in though.

Whether there are separate csects actually depends on the -fdata-sections setting, though I'm surprised about the extra TOC entries. Agree that's probably not a state we want to stay in.

> I do not want to make changing that more difficult.

I'm not sure the alignment from the identifier will complicate those orthogonal changes, but let me post the patch and we can continue the discussion there.

daltenty mentioned this in D144877: [llvm][XCOFF] Don't seperate mergable strings by alignment.Feb 27 2023, 7:55 AM

Did the AIX string issues discussed here ever get resolved?

Herald added subscribers: wangpc, sunshaoce. · View Herald TranscriptAug 25 2023, 9:28 AM

rahulana-quic added a subscriber: rahulana-quic.Aug 25 2023, 10:15 AM

In D134282#4617417, @efriedma wrote:

Did the AIX string issues discussed here ever get resolved?

https://reviews.llvm.org/D156202 removes the alignment-sensitive XCOFF csect determination, so it should resolve the original AIX issue with this patch.

There may be interactions between this patch and https://reviews.llvm.org/D155730. @stefanp, can you comment?

There may be interactions between this patch and https://reviews.llvm.org/D155730

They shouldn't interact? Or at least, they shouldn't interact in a way that affects correctness; the proposed PPCMergeStringPool is a ModulePass.

In D134282#4622252, @efriedma wrote:

There may be interactions between this patch and https://reviews.llvm.org/D155730

They shouldn't interact? Or at least, they shouldn't interact in a way that affects correctness; the proposed PPCMergeStringPool is a ModulePass.

Maybe there is no correctness issue, but pipeline ordering matters for the opportunities that this patch is meant to enable.

Also, (and I hope I am not mischaracterizing what was said) my understanding, from earlier "offline" discussion with @stefanp, is that modifying the properties of a global variable (as this patch enables more widely) is inappropriate to do in a function pass like CodeGenPrepare.

Perhaps there are good reasons to expect that increasing the alignment is "safe", but it has a good chance of causing timing issues (such as changing function codegen when functions are reordered within a TU).

Whether it's a FunctionPass doesn't really matter; despite the "layering violation", there aren't any datastructures that actually care. getOrEnforceKnownAlignment has been doing similar modifications for a long time. I guess I can see an argument that we should try to avoid alignment modifications for all globals after isel has run for any global. Not sure what, exactly, that implies for the latest point it's legal to modify a global.

We could try to move the optimization much earlier, like into the InferAlignment pass proposed in D158529.

efriedma mentioned this in D158529: [InferAlignment] Implement InferAlignmentPass.Aug 30 2023, 9:40 AM

In D134282#4628792, @efriedma wrote:

getOrEnforceKnownAlignment has been doing similar modifications for a long time. I guess I can see an argument that we should try to avoid alignment modifications for all globals after isel has run for any global. Not sure what, exactly, that implies for the latest point it's legal to modify a global.

We could try to move the optimization much earlier, like into the InferAlignment pass proposed in D158529.

I was just informed that @stefanp is away until some time next week. I am hoping to get his input on the importance of trying to get to a "better state" (both in general and for the specific case of the current optimization).

In D134282#4629156, @hubert.reinterpretcast wrote:

In D134282#4628792, @efriedma wrote:

getOrEnforceKnownAlignment has been doing similar modifications for a long time. I guess I can see an argument that we should try to avoid alignment modifications for all globals after isel has run for any global. Not sure what, exactly, that implies for the latest point it's legal to modify a global.

We could try to move the optimization much earlier, like into the InferAlignment pass proposed in D158529.

I was just informed that @stefanp is away until some time next week. I am hoping to get his input on the importance of trying to get to a "better state" (both in general and for the specific case of the current optimization).

Sorry for the late reply.

I am certainly a little nervous about changing global variables in a function pass. If two functions exist with access to the same global is this going to cause a problem when the alignment is changed? Is it possible for one function to act on a changed alignment and the other to act on the original alignment? At this point I can't think of a situation like that so it's probably fine.

In terms of interaction with the String Pooling pass I would say that it mainly depends on the order in which the passes run. If the string pool pass runs first then we may miss the opportunity to over-align something here since the pool is treated as a single variable. If this pass runs first then the string pooling pass will miss pooling a global variable if it is over-aligned. This is actually a limitation of the string pooling pass at the moment but I was hoping to fix it later. In either situation I don't see a functional issue.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

8 lines

lib/

CodeGen/

CodeGenPrepare.cpp

4 lines

TargetLoweringBase.cpp

38 lines

Target/

ARM/

ARMISelLowering.h

5 lines

ARMISelLowering.cpp

5 lines

test/

CodeGen/

RISCV/

memcpy-inline.ll

61 lines

WebAssembly/

bulk-memory.ll

6 lines

bulk-memory64.ll

6 lines

X86/

GlobalISel/

x86_64-irtranslator-struct-return.ll

10 lines

Transforms/

CodeGenPrepare/

RISCV/

adjust-memintrin-alignment.ll

55 lines

Diff 478750

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 1,923 Lines • ▼ Show 20 Lines	public:
/// are happy to sink it into basic blocks. A cast may be free, but not		/// are happy to sink it into basic blocks. A cast may be free, but not
/// necessarily a no-op. e.g. a free truncate from a 64-bit to 32-bit pointer.		/// necessarily a no-op. e.g. a free truncate from a 64-bit to 32-bit pointer.
virtual bool isFreeAddrSpaceCast(unsigned SrcAS, unsigned DestAS) const;		virtual bool isFreeAddrSpaceCast(unsigned SrcAS, unsigned DestAS) const;

/// Return true if the pointer arguments to CI should be aligned by aligning		/// Return true if the pointer arguments to CI should be aligned by aligning
/// the object whose address is being passed. If so then MinSize is set to the		/// the object whose address is being passed. If so then MinSize is set to the
/// minimum size the object must be to be aligned and PrefAlign is set to the		/// minimum size the object must be to be aligned and PrefAlign is set to the
/// preferred alignment.		/// preferred alignment.
virtual bool shouldAlignPointerArgs(CallInst * /CI/, unsigned & /MinSize/,		virtual bool
Align & /PrefAlign/) const {		shouldUpdatePointerArgAlignment(const CallInst *CI, unsigned &MinSize,
		efriedmaUnsubmitted Not Done Reply Inline Actions It looks like the argument "Arg" is now unused? efriedma: It looks like the argument "Arg" is now unused?
		arichardsonAuthorUnsubmitted Done Reply Inline Actions I was using it for allowsMisalignedMemoryAccesses() alignment, but dropped it in the previous diff. I think having the current argument that is being processed could be useful for overrides (since they could make decisions based on the current alignment). I've added the use back now, but am also happy to drop it. arichardson: I was using it for allowsMisalignedMemoryAccesses() alignment, but dropped it in the previous…
		efriedmaUnsubmitted Not Done Reply Inline Actions Still here? Did you mean to upload a different change? efriedma: Still here? Did you mean to upload a different change?
		arichardsonAuthorUnsubmitted Done Reply Inline Actions Arg is used in the call to allowsMisalignedMemoryAccesses(getPointerMemTy() to determine if it's already a fast operation. I can drop it if you prefer, I just thought it is potentially useful to avoid additional aligning. arichardson: Arg is used in the call to allowsMisalignedMemoryAccesses(getPointerMemTy() to determine if…
		efriedmaUnsubmitted Not Done Reply Inline Actions I'm not sure how calling getPointerAlignment() avoids additional alignment, assuming PrefAlign is correct. shouldAlignPointerArgs is supposed to return the minimum "fast" alignment for the given call. If the actual alignment is already greater than or equal to that, the caller does nothing, even if shouldAlignPointerArgs returns true. (If there's some alignment between 1 and PrefAlign that the target considers "fast", I guess you could run into an issue, but that implies PrefAlign is wrong.) efriedma: I'm not sure how calling getPointerAlignment() avoids additional alignment, assuming PrefAlign…
		arichardsonAuthorUnsubmitted Done Reply Inline Actions Fair enough, I'll drop the argument and update the patch to avoid looking at the current alignment. arichardson: Fair enough, I'll drop the argument and update the patch to avoid looking at the current…
return false;		Align &PrefAlign,
}		const TargetTransformInfo &TTI) const;

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
/// \name Helpers for TargetTransformInfo implementations		/// \name Helpers for TargetTransformInfo implementations
/// @{		/// @{

/// Get the ISD node that corresponds to the Instruction class opcode.		/// Get the ISD node that corresponds to the Instruction class opcode.
int InstructionOpcodeToISD(unsigned Opcode) const;		int InstructionOpcodeToISD(unsigned Opcode) const;

▲ Show 20 Lines • Show All 3,223 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,215 Lines • ▼ Show 20 Lines	if (TLI->ExpandInlineAsm(CI)) {
return true;		return true;
}		}
// Sink address computing for memory operands into the block.		// Sink address computing for memory operands into the block.
if (optimizeInlineAsmInst(CI))		if (optimizeInlineAsmInst(CI))
return true;		return true;
}		}

// Align the pointer arguments to this call if the target thinks it's a good		// Align the pointer arguments to this call if the target thinks it's a good
// idea		// idea (generally only useful for memcpy/memmove/memset).
unsigned MinSize;		unsigned MinSize;
Align PrefAlign;		Align PrefAlign;
if (TLI->shouldAlignPointerArgs(CI, MinSize, PrefAlign)) {		if (TLI->shouldUpdatePointerArgAlignment(CI, MinSize, PrefAlign, *TTI)) {
for (auto &Arg : CI->args()) {		for (auto &Arg : CI->args()) {
// We want to align both objects whose address is used directly and		// We want to align both objects whose address is used directly and
// objects whose address is used in casts and GEPs, though it only makes		// objects whose address is used in casts and GEPs, though it only makes
// sense for GEPs if the offset is a multiple of the desired alignment and		// sense for GEPs if the offset is a multiple of the desired alignment and
// if size - offset meets the size threshold.		// if size - offset meets the size threshold.
if (!Arg->getType()->isPointerTy())		if (!Arg->getType()->isPointerTy())
continue;		continue;
APInt Offset(DL->getIndexSizeInBits(		APInt Offset(DL->getIndexSizeInBits(
▲ Show 20 Lines • Show All 6,343 Lines • Show Last 20 Lines

llvm/lib/CodeGen/TargetLoweringBase.cpp

Show All 37 Lines
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/CallingConv.h"		#include "llvm/IR/CallingConv.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalValue.h"		#include "llvm/IR/GlobalValue.h"
#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/MachineValueType.h"		#include "llvm/Support/MachineValueType.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
▲ Show 20 Lines • Show All 887 Lines • ▼ Show 20 Lines	bool TargetLoweringBase::canOpTrap(unsigned Op, EVT VT) const {
}		}
}		}

bool TargetLoweringBase::isFreeAddrSpaceCast(unsigned SrcAS,		bool TargetLoweringBase::isFreeAddrSpaceCast(unsigned SrcAS,
unsigned DestAS) const {		unsigned DestAS) const {
return TM.isNoopAddrSpaceCast(SrcAS, DestAS);		return TM.isNoopAddrSpaceCast(SrcAS, DestAS);
}		}

		bool TargetLoweringBase::shouldUpdatePointerArgAlignment(
		const CallInst *CI, unsigned &MinSize, Align &PrefAlign,
		const TargetTransformInfo &TTI) const {
		// For now, we only adjust alignment for memcpy/memmove/memset calls.
		auto *MemCI = dyn_cast<MemIntrinsic>(CI);
		if (!MemCI)
		return false;
		auto AddrSpace = MemCI->getDestAddressSpace();
		const DataLayout &DL = CI->getModule()->getDataLayout();
		// We assume that scalar register sized values can be loaded/stored
		// efficiently. If this is not the case for a given target it should override
		// this function.
		auto PrefSizeBits =
		TTI.getRegisterBitWidth(TargetTransformInfo::RGK_Scalar).getFixedSize();
		PrefAlign = Align(PrefSizeBits / 8);
		// When building with -Oz, we only increase the alignment if the object is
		// at least 8 bytes in size to avoid increased stack/global padding.
		arichardsonAuthorUnsubmitted Done Reply Inline Actions @jrtc27 we may want to adjust these values for CHERI to not require 16-byte alignment&size, but I think even without a 8-byte fallback this should be a (minor) net win. arichardson: @jrtc27 we may want to adjust these values for CHERI to not require 16-byte alignment&size, but…
		// Otherwise, we require at least PrefAlign bytes to be copied.
		efriedmaUnsubmitted Not Done Reply Inline Actions The question is what size load/store ops would we prefer to use for the memcpy, and whether those ops require alignment. Using the alignment of a pointer seems arbitrary; we aren't loading or storing a pointer. I'd prefer not to call getPointerAlignment() here if we can avoid it; the caller already does the math to figure out the current alignment and the increase. shouldUpdatePointerArgAlignment just needs to know what alignment it wants, not whether the call currently satisfies that alignment. efriedma: The question is what size load/store ops would we prefer to use for the memcpy, and whether…
		JojoRUnsubmitted Not Done Reply Inline Actions As @efriedma said, we can set default PrefAlign according to PointerSize, but we should return final PrefAlign from backend du to backend's requirement, different ISAs maybe have different alignment :) JojoR: As @efriedma said, we can set default PrefAlign according to PointerSize, but we should return…
		arichardsonAuthorUnsubmitted Done Reply Inline Actions I agree that this is a backend-specific choice. I would assume loading an aligned pointer is an efficient operation on most/all targets, and for the ones where this is not true, they can override `shouldUpdatePointerArgAlignment()`. I believe this change should match now what you did for RISC-V: MinSize==XLEN PrefAlign==XLEN. arichardson: I agree that this is a backend-specific choice. I would assume loading an aligned pointer is an…
		efriedmaUnsubmitted Not Done Reply Inline Actions Maybe TargetTransformInfo::getRegisterBitWidth() is a better default than getPointerPrefAlignment()? I guess that's the same thing for most targets, but it probably makes the intent a bit more clear... I agree there isn't any default that's going to be correct for all targets. efriedma: Maybe TargetTransformInfo::getRegisterBitWidth() is a better default than…
		arichardsonAuthorUnsubmitted Done Reply Inline Actions I was not aware of that function. Thanks for pointing it out that does indeed sounds like the best default. arichardson: I was not aware of that function. Thanks for pointing it out that does indeed sounds like the…
		efriedmaUnsubmitted Not Done Reply Inline Actions Did you mean to address this? efriedma: Did you mean to address this?
		MinSize = PrefAlign.value();
		if (CI->getFunction()->hasMinSize())
		MinSize = std::max(MinSize, 8u);

		// XXX: we could determine the MachineMemOperand flags instead of assuming
		// load+store (but it probably makes no difference for supported targets).
		unsigned FastUnalignedAccess = 0;
		if (allowsMisalignedMemoryAccesses(
		LLT::scalar(PrefSizeBits), AddrSpace, Align(1),
		MachineMemOperand::MOStore \| MachineMemOperand::MOLoad,
		&FastUnalignedAccess) &&
		FastUnalignedAccess) {
		// If unaligned loads&stores are fast, there is no need to adjust
		// alignment.
		return false;
		}
		return true; // unaligned accesses are not possible or slow.
		}

void TargetLoweringBase::setJumpIsExpensive(bool isExpensive) {		void TargetLoweringBase::setJumpIsExpensive(bool isExpensive) {
// If the command-line option was specified, ignore this request.		// If the command-line option was specified, ignore this request.
if (!JumpIsExpensiveOverride.getNumOccurrences())		if (!JumpIsExpensiveOverride.getNumOccurrences())
JumpIsExpensive = isExpensive;		JumpIsExpensive = isExpensive;
}		}

TargetLoweringBase::LegalizeKind		TargetLoweringBase::LegalizeKind
TargetLoweringBase::getTypeConversion(LLVMContext &Context, EVT VT) const {		TargetLoweringBase::getTypeConversion(LLVMContext &Context, EVT VT) const {
▲ Show 20 Lines • Show All 1,382 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 566 Lines • ▼ Show 20 Lines	const ARMSubtarget* getSubtarget() const {
return Subtarget;		return Subtarget;
}		}

/// getRegClassFor - Return the register class that should be used for the		/// getRegClassFor - Return the register class that should be used for the
/// specified value type.		/// specified value type.
const TargetRegisterClass *		const TargetRegisterClass *
getRegClassFor(MVT VT, bool isDivergent = false) const override;		getRegClassFor(MVT VT, bool isDivergent = false) const override;

bool shouldAlignPointerArgs(CallInst *CI, unsigned &MinSize,		bool shouldUpdatePointerArgAlignment(
Align &PrefAlign) const override;		const CallInst *CI, unsigned &MinSize, Align &PrefAlign,
		const TargetTransformInfo &TTI) const override;

/// createFastISel - This method returns a target specific FastISel object,		/// createFastISel - This method returns a target specific FastISel object,
/// or null if the target does not support "fast" ISel.		/// or null if the target does not support "fast" ISel.
FastISel *createFastISel(FunctionLoweringInfo &funcInfo,		FastISel *createFastISel(FunctionLoweringInfo &funcInfo,
const TargetLibraryInfo *libInfo) const override;		const TargetLibraryInfo *libInfo) const override;

Sched::Preference getSchedulingPreference(SDNode *N) const override;		Sched::Preference getSchedulingPreference(SDNode *N) const override;

▲ Show 20 Lines • Show All 415 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,913 Lines • ▼ Show 20 Lines	if (VT == MVT::v8i64)
return &ARM::MQQQQPRRegClass;		return &ARM::MQQQQPRRegClass;
}		}
return TargetLowering::getRegClassFor(VT);		return TargetLowering::getRegClassFor(VT);
}		}

// memcpy, and other memory intrinsics, typically tries to use LDM/STM if the		// memcpy, and other memory intrinsics, typically tries to use LDM/STM if the
// source/dest is aligned and the copy size is large enough. We therefore want		// source/dest is aligned and the copy size is large enough. We therefore want
// to align such objects passed to memory intrinsics.		// to align such objects passed to memory intrinsics.
bool ARMTargetLowering::shouldAlignPointerArgs(CallInst *CI, unsigned &MinSize,		bool ARMTargetLowering::shouldUpdatePointerArgAlignment(
Align &PrefAlign) const {		const CallInst *CI, unsigned &MinSize, Align &PrefAlign,
		const TargetTransformInfo &TTI) const {
if (!isa<MemIntrinsic>(CI))		if (!isa<MemIntrinsic>(CI))
return false;		return false;
MinSize = 8;		MinSize = 8;
		efriedmaUnsubmitted Not Done Reply Inline Actions You want to make this more aggressive by default? Maybe... but we probably want different heuristics for small copies. (For example, aligning a 3-byte copy to 8 bytes makes no sense; we can't take advantage of alignment greater than 2 bytes.) efriedma: You want to make this more aggressive by default? Maybe... but we probably want different…
		arichardsonAuthorUnsubmitted Done Reply Inline Actions I've reverted this part of the diff and added test to show we don't adjust 3/7 byte objects arichardson: I've reverted this part of the diff and added test to show we don't adjust 3/7 byte objects
// On ARM11 onwards (excluding M class) 8-byte aligned LDM is typically 1		// On ARM11 onwards (excluding M class) 8-byte aligned LDM is typically 1
// cycle faster than 4-byte aligned LDM.		// cycle faster than 4-byte aligned LDM.
PrefAlign =		PrefAlign =
(Subtarget->hasV6Ops() && !Subtarget->isMClass() ? Align(8) : Align(4));		(Subtarget->hasV6Ops() && !Subtarget->isMClass() ? Align(8) : Align(4));
return true;		return true;
}		}

// Create a fast isel object.		// Create a fast isel object.
▲ Show 20 Lines • Show All 19,995 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/memcpy-inline.ll

	Show First 20 Lines • Show All 289 Lines • ▼ Show 20 Lines
	; RV64UNALIGNED-NEXT: sw a1, 0(a0)			; RV64UNALIGNED-NEXT: sw a1, 0(a0)
	; RV64UNALIGNED-NEXT: ret			; RV64UNALIGNED-NEXT: ret
	entry:			entry:
	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([7 x i8], [7 x i8]* @.str5, i64 0, i64 0), i64 7, i1 false)			tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %C, i8* getelementptr inbounds ([7 x i8], [7 x i8]* @.str5, i64 0, i64 0), i64 7, i1 false)
	ret void			ret void
	}			}

	define void @t6() nounwind {			define void @t6() nounwind {
	; RV32ALIGNED-LABEL: t6:			; RV32-LABEL: t6:
	; RV32ALIGNED: # %bb.0: # %entry			; RV32: # %bb.0: # %entry
	; RV32ALIGNED-NEXT: addi sp, sp, -16			; RV32-NEXT: lui a0, %hi(spool.splbuf)
	; RV32ALIGNED-NEXT: sw ra, 12(sp) # 4-byte Folded Spill			; RV32-NEXT: li a1, 88
	; RV32ALIGNED-NEXT: lui a0, %hi(spool.splbuf)			; RV32-NEXT: sh a1, %lo(spool.splbuf+12)(a0)
	; RV32ALIGNED-NEXT: addi a0, a0, %lo(spool.splbuf)			; RV32-NEXT: lui a1, 361862
	; RV32ALIGNED-NEXT: lui a1, %hi(.L.str6)			; RV32-NEXT: addi a1, a1, -1960
	; RV32ALIGNED-NEXT: addi a1, a1, %lo(.L.str6)			; RV32-NEXT: sw a1, %lo(spool.splbuf+8)(a0)
	; RV32ALIGNED-NEXT: li a2, 14			; RV32-NEXT: lui a1, 362199
	; RV32ALIGNED-NEXT: call memcpy@plt			; RV32-NEXT: addi a1, a1, 559
	; RV32ALIGNED-NEXT: lw ra, 12(sp) # 4-byte Folded Reload			; RV32-NEXT: sw a1, %lo(spool.splbuf+4)(a0)
	; RV32ALIGNED-NEXT: addi sp, sp, 16			; RV32-NEXT: lui a1, 460503
	; RV32ALIGNED-NEXT: ret			; RV32-NEXT: addi a1, a1, 1071
				; RV32-NEXT: sw a1, %lo(spool.splbuf)(a0)
				; RV32-NEXT: ret
	;			;
	; RV64ALIGNED-LABEL: t6:			; RV64ALIGNED-LABEL: t6:
	; RV64ALIGNED: # %bb.0: # %entry			; RV64ALIGNED: # %bb.0: # %entry
	; RV64ALIGNED-NEXT: addi sp, sp, -16
	; RV64ALIGNED-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
	; RV64ALIGNED-NEXT: lui a0, %hi(spool.splbuf)			; RV64ALIGNED-NEXT: lui a0, %hi(spool.splbuf)
	; RV64ALIGNED-NEXT: addi a0, a0, %lo(spool.splbuf)			; RV64ALIGNED-NEXT: li a1, 88
	; RV64ALIGNED-NEXT: lui a1, %hi(.L.str6)			; RV64ALIGNED-NEXT: sh a1, %lo(spool.splbuf+12)(a0)
	; RV64ALIGNED-NEXT: addi a1, a1, %lo(.L.str6)			; RV64ALIGNED-NEXT: lui a1, %hi(.LCPI6_0)
	; RV64ALIGNED-NEXT: li a2, 14			; RV64ALIGNED-NEXT: ld a1, %lo(.LCPI6_0)(a1)
	; RV64ALIGNED-NEXT: call memcpy@plt			; RV64ALIGNED-NEXT: lui a2, 361862
	; RV64ALIGNED-NEXT: ld ra, 8(sp) # 8-byte Folded Reload			; RV64ALIGNED-NEXT: addiw a2, a2, -1960
	; RV64ALIGNED-NEXT: addi sp, sp, 16			; RV64ALIGNED-NEXT: sw a2, %lo(spool.splbuf+8)(a0)
				; RV64ALIGNED-NEXT: sd a1, %lo(spool.splbuf)(a0)
	; RV64ALIGNED-NEXT: ret			; RV64ALIGNED-NEXT: ret
	;			;
	; RV32UNALIGNED-LABEL: t6:
	; RV32UNALIGNED: # %bb.0: # %entry
	; RV32UNALIGNED-NEXT: lui a0, %hi(spool.splbuf)
	; RV32UNALIGNED-NEXT: li a1, 88
	; RV32UNALIGNED-NEXT: sh a1, %lo(spool.splbuf+12)(a0)
	; RV32UNALIGNED-NEXT: lui a1, 361862
	; RV32UNALIGNED-NEXT: addi a1, a1, -1960
	; RV32UNALIGNED-NEXT: sw a1, %lo(spool.splbuf+8)(a0)
	; RV32UNALIGNED-NEXT: lui a1, 362199
	; RV32UNALIGNED-NEXT: addi a1, a1, 559
	; RV32UNALIGNED-NEXT: sw a1, %lo(spool.splbuf+4)(a0)
	; RV32UNALIGNED-NEXT: lui a1, 460503
	; RV32UNALIGNED-NEXT: addi a1, a1, 1071
	; RV32UNALIGNED-NEXT: sw a1, %lo(spool.splbuf)(a0)
	; RV32UNALIGNED-NEXT: ret
	;
	; RV64UNALIGNED-LABEL: t6:			; RV64UNALIGNED-LABEL: t6:
	; RV64UNALIGNED: # %bb.0: # %entry			; RV64UNALIGNED: # %bb.0: # %entry
	; RV64UNALIGNED-NEXT: lui a0, %hi(.L.str6)			; RV64UNALIGNED-NEXT: lui a0, %hi(.L.str6)
	; RV64UNALIGNED-NEXT: ld a0, %lo(.L.str6)(a0)			; RV64UNALIGNED-NEXT: ld a0, %lo(.L.str6)(a0)
	; RV64UNALIGNED-NEXT: lui a1, %hi(spool.splbuf)			; RV64UNALIGNED-NEXT: lui a1, %hi(spool.splbuf)
	; RV64UNALIGNED-NEXT: li a2, 88			; RV64UNALIGNED-NEXT: li a2, 88
	; RV64UNALIGNED-NEXT: sh a2, %lo(spool.splbuf+12)(a1)			; RV64UNALIGNED-NEXT: sh a2, %lo(spool.splbuf+12)(a1)
	; RV64UNALIGNED-NEXT: sd a0, %lo(spool.splbuf)(a1)			; RV64UNALIGNED-NEXT: sd a0, %lo(spool.splbuf)(a1)
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/test/CodeGen/WebAssembly/bulk-memory.ll

	Show First 20 Lines • Show All 151 Lines • ▼ Show 20 Lines
	; WebAssemblyFrameLowering::needsSPWriteback would be true.			; WebAssemblyFrameLowering::needsSPWriteback would be true.

	; CHECK-LABEL: memcpy_alloca_src:			; CHECK-LABEL: memcpy_alloca_src:
	; NO-BULK-MEM-NOT: memory.copy			; NO-BULK-MEM-NOT: memory.copy
	; BULK-MEM-NEXT: .functype memcpy_alloca_src (i32) -> ()			; BULK-MEM-NEXT: .functype memcpy_alloca_src (i32) -> ()
	; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer			; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer
	; BULK-MEM-NEXT: i32.const $push[[L1:[0-9]+]]=, 112			; BULK-MEM-NEXT: i32.const $push[[L1:[0-9]+]]=, 112
	; BULK-MEM-NEXT: i32.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]			; BULK-MEM-NEXT: i32.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]
	; BULK-MEM-NEXT: i32.const $push[[L3:[0-9]+]]=, 12			; BULK-MEM-NEXT: i32.const $push[[L3:[0-9]+]]=, 8
	; BULK-MEM-NEXT: i32.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]			; BULK-MEM-NEXT: i32.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]
	; BULK-MEM-NEXT: i32.const $push[[L5:[0-9]+]]=, 100			; BULK-MEM-NEXT: i32.const $push[[L5:[0-9]+]]=, 100
	; BULK-MEM-NEXT: memory.copy 0, 0, $0, $pop[[L4]], $pop[[L5]]			; BULK-MEM-NEXT: memory.copy 0, 0, $0, $pop[[L4]], $pop[[L5]]
	; BULK-MEM-NEXT: return			; BULK-MEM-NEXT: return
	define void @memcpy_alloca_src(i8* %dst) {			define void @memcpy_alloca_src(i8* %dst) {
	%a = alloca [100 x i8]			%a = alloca [100 x i8]
	%p = bitcast [100 x i8]* %a to i8*			%p = bitcast [100 x i8]* %a to i8*
	call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dst, i8* %p, i32 100, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dst, i8* %p, i32 100, i1 false)
	ret void			ret void
	}			}

	; CHECK-LABEL: memcpy_alloca_dst:			; CHECK-LABEL: memcpy_alloca_dst:
	; NO-BULK-MEM-NOT: memory.copy			; NO-BULK-MEM-NOT: memory.copy
	; BULK-MEM-NEXT: .functype memcpy_alloca_dst (i32) -> ()			; BULK-MEM-NEXT: .functype memcpy_alloca_dst (i32) -> ()
	; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer			; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer
	; BULK-MEM-NEXT: i32.const $push[[L1:[0-9]+]]=, 112			; BULK-MEM-NEXT: i32.const $push[[L1:[0-9]+]]=, 112
	; BULK-MEM-NEXT: i32.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]			; BULK-MEM-NEXT: i32.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]
	; BULK-MEM-NEXT: i32.const $push[[L3:[0-9]+]]=, 12			; BULK-MEM-NEXT: i32.const $push[[L3:[0-9]+]]=, 8
	; BULK-MEM-NEXT: i32.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]			; BULK-MEM-NEXT: i32.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]
	; BULK-MEM-NEXT: i32.const $push[[L5:[0-9]+]]=, 100			; BULK-MEM-NEXT: i32.const $push[[L5:[0-9]+]]=, 100
	; BULK-MEM-NEXT: memory.copy 0, 0, $pop[[L4]], $0, $pop[[L5]]			; BULK-MEM-NEXT: memory.copy 0, 0, $pop[[L4]], $0, $pop[[L5]]
	; BULK-MEM-NEXT: return			; BULK-MEM-NEXT: return
	define void @memcpy_alloca_dst(i8* %src) {			define void @memcpy_alloca_dst(i8* %src) {
	%a = alloca [100 x i8]			%a = alloca [100 x i8]
	%p = bitcast [100 x i8]* %a to i8*			%p = bitcast [100 x i8]* %a to i8*
	call void @llvm.memcpy.p0i8.p0i8.i32(i8* %p, i8* %src, i32 100, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i32(i8* %p, i8* %src, i32 100, i1 false)
	ret void			ret void
	}			}

	; CHECK-LABEL: memset_alloca:			; CHECK-LABEL: memset_alloca:
	; NO-BULK-MEM-NOT: memory.fill			; NO-BULK-MEM-NOT: memory.fill
	; BULK-MEM-NEXT: .functype memset_alloca (i32) -> ()			; BULK-MEM-NEXT: .functype memset_alloca (i32) -> ()
	; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer			; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer
	; BULK-MEM-NEXT: i32.const $push[[L1:[0-9]+]]=, 112			; BULK-MEM-NEXT: i32.const $push[[L1:[0-9]+]]=, 112
	; BULK-MEM-NEXT: i32.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]			; BULK-MEM-NEXT: i32.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]
	; BULK-MEM-NEXT: i32.const $push[[L3:[0-9]+]]=, 12			; BULK-MEM-NEXT: i32.const $push[[L3:[0-9]+]]=, 8
	; BULK-MEM-NEXT: i32.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]			; BULK-MEM-NEXT: i32.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]
	; BULK-MEM-NEXT: i32.const $push[[L5:[0-9]+]]=, 100			; BULK-MEM-NEXT: i32.const $push[[L5:[0-9]+]]=, 100
	; BULK-MEM-NEXT: memory.fill 0, $pop[[L4]], $0, $pop[[L5]]			; BULK-MEM-NEXT: memory.fill 0, $pop[[L4]], $0, $pop[[L5]]
	; BULK-MEM-NEXT: return			; BULK-MEM-NEXT: return
	define void @memset_alloca(i8 %val) {			define void @memset_alloca(i8 %val) {
	%a = alloca [100 x i8]			%a = alloca [100 x i8]
	%p = bitcast [100 x i8]* %a to i8*			%p = bitcast [100 x i8]* %a to i8*
	call void @llvm.memset.p0i8.i32(i8* %p, i8 %val, i32 100, i1 false)			call void @llvm.memset.p0i8.i32(i8* %p, i8 %val, i32 100, i1 false)
	ret void			ret void
	}			}

llvm/test/CodeGen/WebAssembly/bulk-memory64.ll

	Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines
	; WebAssemblyFrameLowering::needsSPWriteback would be true.			; WebAssemblyFrameLowering::needsSPWriteback would be true.

	; CHECK-LABEL: memcpy_alloca_src:			; CHECK-LABEL: memcpy_alloca_src:
	; NO-BULK-MEM-NOT: memory.copy			; NO-BULK-MEM-NOT: memory.copy
	; BULK-MEM-NEXT: .functype memcpy_alloca_src (i64) -> ()			; BULK-MEM-NEXT: .functype memcpy_alloca_src (i64) -> ()
	; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer			; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer
	; BULK-MEM-NEXT: i64.const $push[[L1:[0-9]+]]=, 112			; BULK-MEM-NEXT: i64.const $push[[L1:[0-9]+]]=, 112
	; BULK-MEM-NEXT: i64.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]			; BULK-MEM-NEXT: i64.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]
	; BULK-MEM-NEXT: i64.const $push[[L3:[0-9]+]]=, 12			; BULK-MEM-NEXT: i64.const $push[[L3:[0-9]+]]=, 8
	; BULK-MEM-NEXT: i64.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]			; BULK-MEM-NEXT: i64.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]
	; BULK-MEM-NEXT: i64.const $push[[L5:[0-9]+]]=, 100			; BULK-MEM-NEXT: i64.const $push[[L5:[0-9]+]]=, 100
	; BULK-MEM-NEXT: memory.copy 0, 0, $0, $pop[[L4]], $pop[[L5]]			; BULK-MEM-NEXT: memory.copy 0, 0, $0, $pop[[L4]], $pop[[L5]]
	; BULK-MEM-NEXT: return			; BULK-MEM-NEXT: return
	define void @memcpy_alloca_src(i8* %dst) {			define void @memcpy_alloca_src(i8* %dst) {
	%a = alloca [100 x i8]			%a = alloca [100 x i8]
	%p = bitcast [100 x i8]* %a to i8*			%p = bitcast [100 x i8]* %a to i8*
	call void @llvm.memcpy.p0i8.p0i8.i64(i8* %dst, i8* %p, i64 100, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i64(i8* %dst, i8* %p, i64 100, i1 false)
	ret void			ret void
	}			}

	; CHECK-LABEL: memcpy_alloca_dst:			; CHECK-LABEL: memcpy_alloca_dst:
	; NO-BULK-MEM-NOT: memory.copy			; NO-BULK-MEM-NOT: memory.copy
	; BULK-MEM-NEXT: .functype memcpy_alloca_dst (i64) -> ()			; BULK-MEM-NEXT: .functype memcpy_alloca_dst (i64) -> ()
	; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer			; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer
	; BULK-MEM-NEXT: i64.const $push[[L1:[0-9]+]]=, 112			; BULK-MEM-NEXT: i64.const $push[[L1:[0-9]+]]=, 112
	; BULK-MEM-NEXT: i64.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]			; BULK-MEM-NEXT: i64.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]
	; BULK-MEM-NEXT: i64.const $push[[L3:[0-9]+]]=, 12			; BULK-MEM-NEXT: i64.const $push[[L3:[0-9]+]]=, 8
	; BULK-MEM-NEXT: i64.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]			; BULK-MEM-NEXT: i64.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]
	; BULK-MEM-NEXT: i64.const $push[[L5:[0-9]+]]=, 100			; BULK-MEM-NEXT: i64.const $push[[L5:[0-9]+]]=, 100
	; BULK-MEM-NEXT: memory.copy 0, 0, $pop[[L4]], $0, $pop[[L5]]			; BULK-MEM-NEXT: memory.copy 0, 0, $pop[[L4]], $0, $pop[[L5]]
	; BULK-MEM-NEXT: return			; BULK-MEM-NEXT: return
	define void @memcpy_alloca_dst(i8* %src) {			define void @memcpy_alloca_dst(i8* %src) {
	%a = alloca [100 x i8]			%a = alloca [100 x i8]
	%p = bitcast [100 x i8]* %a to i8*			%p = bitcast [100 x i8]* %a to i8*
	call void @llvm.memcpy.p0i8.p0i8.i64(i8* %p, i8* %src, i64 100, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i64(i8* %p, i8* %src, i64 100, i1 false)
	ret void			ret void
	}			}

	; CHECK-LABEL: memset_alloca:			; CHECK-LABEL: memset_alloca:
	; NO-BULK-MEM-NOT: memory.fill			; NO-BULK-MEM-NOT: memory.fill
	; BULK-MEM-NEXT: .functype memset_alloca (i32) -> ()			; BULK-MEM-NEXT: .functype memset_alloca (i32) -> ()
	; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer			; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer
	; BULK-MEM-NEXT: i64.const $push[[L1:[0-9]+]]=, 112			; BULK-MEM-NEXT: i64.const $push[[L1:[0-9]+]]=, 112
	; BULK-MEM-NEXT: i64.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]			; BULK-MEM-NEXT: i64.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]
	; BULK-MEM-NEXT: i64.const $push[[L3:[0-9]+]]=, 12			; BULK-MEM-NEXT: i64.const $push[[L3:[0-9]+]]=, 8
	; BULK-MEM-NEXT: i64.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]			; BULK-MEM-NEXT: i64.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]
	; BULK-MEM-NEXT: i64.const $push[[L5:[0-9]+]]=, 100			; BULK-MEM-NEXT: i64.const $push[[L5:[0-9]+]]=, 100
	; BULK-MEM-NEXT: memory.fill 0, $pop[[L4]], $0, $pop[[L5]]			; BULK-MEM-NEXT: memory.fill 0, $pop[[L4]], $0, $pop[[L5]]
	; BULK-MEM-NEXT: return			; BULK-MEM-NEXT: return
	define void @memset_alloca(i8 %val) {			define void @memset_alloca(i8 %val) {
	%a = alloca [100 x i8]			%a = alloca [100 x i8]
	%p = bitcast [100 x i8]* %a to i8*			%p = bitcast [100 x i8]* %a to i8*
	call void @llvm.memset.p0i8.i64(i8* %p, i8 %val, i64 100, i1 false)			call void @llvm.memset.p0i8.i64(i8* %p, i8 %val, i64 100, i1 false)
	ret void			ret void
	}			}

llvm/test/CodeGen/X86/GlobalISel/x86_64-irtranslator-struct-return.ll

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	define i64 @test_return_i2(i64 %i.coerce) {
; ALL-LABEL: name: test_return_i2		; ALL-LABEL: name: test_return_i2
; ALL: bb.1.entry:		; ALL: bb.1.entry:
; ALL: liveins: $rdi		; ALL: liveins: $rdi
; ALL: [[COPY:%[0-9]+]]:_(s64) = COPY $rdi		; ALL: [[COPY:%[0-9]+]]:_(s64) = COPY $rdi
; ALL: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 8		; ALL: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 8
; ALL: [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0.retval		; ALL: [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0.retval
; ALL: [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1.i		; ALL: [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1.i
; ALL: G_STORE [[COPY]](s64), [[FRAME_INDEX1]](p0) :: (store (s64) into %ir.0, align 4)		; ALL: G_STORE [[COPY]](s64), [[FRAME_INDEX1]](p0) :: (store (s64) into %ir.0, align 4)
; ALL: G_MEMCPY [[FRAME_INDEX]](p0), [[FRAME_INDEX1]](p0), [[C]](s64), 0 :: (store (s8) into %ir.1, align 4), (load (s8) from %ir.2, align 4)		; ALL: G_MEMCPY [[FRAME_INDEX]](p0), [[FRAME_INDEX1]](p0), [[C]](s64), 0 :: (store (s8) into %ir.1, align 8), (load (s8) from %ir.2, align 8)
; ALL: [[LOAD:%[0-9]+]]:_(s64) = G_LOAD [[FRAME_INDEX]](p0) :: (dereferenceable load (s64) from %ir.3, align 4)		; ALL: [[LOAD:%[0-9]+]]:_(s64) = G_LOAD [[FRAME_INDEX]](p0) :: (dereferenceable load (s64) from %ir.3, align 4)
; ALL: $rax = COPY [[LOAD]](s64)		; ALL: $rax = COPY [[LOAD]](s64)
; ALL: RET 0, implicit $rax		; ALL: RET 0, implicit $rax
entry:		entry:
%retval = alloca %struct.i2, align 4		%retval = alloca %struct.i2, align 4
%i = alloca %struct.i2, align 4		%i = alloca %struct.i2, align 4
%0 = bitcast %struct.i2* %i to i64*		%0 = bitcast %struct.i2* %i to i64*
store i64 %i.coerce, i64* %0, align 4		store i64 %i.coerce, i64* %0, align 4
Show All 15 Lines	define { i64, i32 } @test_return_i3(i64 %i.coerce0, i32 %i.coerce1) {
; ALL: [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0.retval		; ALL: [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0.retval
; ALL: [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1.i		; ALL: [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1.i
; ALL: [[FRAME_INDEX2:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.2.coerce		; ALL: [[FRAME_INDEX2:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.2.coerce
; ALL: [[FRAME_INDEX3:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.3.tmp		; ALL: [[FRAME_INDEX3:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.3.tmp
; ALL: G_STORE [[COPY]](s64), [[FRAME_INDEX2]](p0) :: (store (s64) into %ir.0, align 4)		; ALL: G_STORE [[COPY]](s64), [[FRAME_INDEX2]](p0) :: (store (s64) into %ir.0, align 4)
; ALL: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 8		; ALL: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 8
; ALL: [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[FRAME_INDEX2]], [[C1]](s64)		; ALL: [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[FRAME_INDEX2]], [[C1]](s64)
; ALL: G_STORE [[COPY1]](s32), [[PTR_ADD]](p0) :: (store (s32) into %ir.1)		; ALL: G_STORE [[COPY1]](s32), [[PTR_ADD]](p0) :: (store (s32) into %ir.1)
; ALL: G_MEMCPY [[FRAME_INDEX1]](p0), [[FRAME_INDEX2]](p0), [[C]](s64), 0 :: (store (s8) into %ir.2, align 4), (load (s8) from %ir.3, align 4)		; ALL: G_MEMCPY [[FRAME_INDEX1]](p0), [[FRAME_INDEX2]](p0), [[C]](s64), 0 :: (store (s8) into %ir.2, align 8), (load (s8) from %ir.3, align 8)
; ALL: G_MEMCPY [[FRAME_INDEX]](p0), [[FRAME_INDEX1]](p0), [[C]](s64), 0 :: (store (s8) into %ir.4, align 4), (load (s8) from %ir.5, align 4)		; ALL: G_MEMCPY [[FRAME_INDEX]](p0), [[FRAME_INDEX1]](p0), [[C]](s64), 0 :: (store (s8) into %ir.4, align 8), (load (s8) from %ir.5, align 8)
; ALL: G_MEMCPY [[FRAME_INDEX3]](p0), [[FRAME_INDEX]](p0), [[C]](s64), 0 :: (store (s8) into %ir.6, align 8), (load (s8) from %ir.7, align 4)		; ALL: G_MEMCPY [[FRAME_INDEX3]](p0), [[FRAME_INDEX]](p0), [[C]](s64), 0 :: (store (s8) into %ir.6, align 8), (load (s8) from %ir.7, align 8)
; ALL: [[LOAD:%[0-9]+]]:_(s64) = G_LOAD [[FRAME_INDEX3]](p0) :: (dereferenceable load (s64) from %ir.tmp)		; ALL: [[LOAD:%[0-9]+]]:_(s64) = G_LOAD [[FRAME_INDEX3]](p0) :: (dereferenceable load (s64) from %ir.tmp)
; ALL: [[PTR_ADD1:%[0-9]+]]:_(p0) = G_PTR_ADD [[FRAME_INDEX3]], [[C1]](s64)		; ALL: [[PTR_ADD1:%[0-9]+]]:_(p0) = G_PTR_ADD [[FRAME_INDEX3]], [[C1]](s64)
; ALL: [[LOAD1:%[0-9]+]]:_(s32) = G_LOAD [[PTR_ADD1]](p0) :: (dereferenceable load (s32) from %ir.tmp + 8, align 8)		; ALL: [[LOAD1:%[0-9]+]]:_(s32) = G_LOAD [[PTR_ADD1]](p0) :: (dereferenceable load (s32) from %ir.tmp + 8, align 8)
; ALL: $rax = COPY [[LOAD]](s64)		; ALL: $rax = COPY [[LOAD]](s64)
; ALL: $edx = COPY [[LOAD1]](s32)		; ALL: $edx = COPY [[LOAD1]](s32)
; ALL: RET 0, implicit $rax, implicit $edx		; ALL: RET 0, implicit $rax, implicit $edx
entry:		entry:
%retval = alloca %struct.i3, align 4		%retval = alloca %struct.i3, align 4
Show All 25 Lines	define { i64, i64 } @test_return_i4(i64 %i.coerce0, i64 %i.coerce1) {
; ALL: [[COPY1:%[0-9]+]]:_(s64) = COPY $rsi		; ALL: [[COPY1:%[0-9]+]]:_(s64) = COPY $rsi
; ALL: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 16		; ALL: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 16
; ALL: [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0.retval		; ALL: [[FRAME_INDEX:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.0.retval
; ALL: [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1.i		; ALL: [[FRAME_INDEX1:%[0-9]+]]:_(p0) = G_FRAME_INDEX %stack.1.i
; ALL: G_STORE [[COPY]](s64), [[FRAME_INDEX1]](p0) :: (store (s64) into %ir.1, align 4)		; ALL: G_STORE [[COPY]](s64), [[FRAME_INDEX1]](p0) :: (store (s64) into %ir.1, align 4)
; ALL: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 8		; ALL: [[C1:%[0-9]+]]:_(s64) = G_CONSTANT i64 8
; ALL: [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[FRAME_INDEX1]], [[C1]](s64)		; ALL: [[PTR_ADD:%[0-9]+]]:_(p0) = G_PTR_ADD [[FRAME_INDEX1]], [[C1]](s64)
; ALL: G_STORE [[COPY1]](s64), [[PTR_ADD]](p0) :: (store (s64) into %ir.2, align 4)		; ALL: G_STORE [[COPY1]](s64), [[PTR_ADD]](p0) :: (store (s64) into %ir.2, align 4)
; ALL: G_MEMCPY [[FRAME_INDEX]](p0), [[FRAME_INDEX1]](p0), [[C]](s64), 0 :: (store (s8) into %ir.3, align 4), (load (s8) from %ir.4, align 4)		; ALL: G_MEMCPY [[FRAME_INDEX]](p0), [[FRAME_INDEX1]](p0), [[C]](s64), 0 :: (store (s8) into %ir.3, align 8), (load (s8) from %ir.4, align 8)
; ALL: [[LOAD:%[0-9]+]]:_(s64) = G_LOAD [[FRAME_INDEX]](p0) :: (dereferenceable load (s64) from %ir.5, align 4)		; ALL: [[LOAD:%[0-9]+]]:_(s64) = G_LOAD [[FRAME_INDEX]](p0) :: (dereferenceable load (s64) from %ir.5, align 4)
; ALL: [[PTR_ADD1:%[0-9]+]]:_(p0) = G_PTR_ADD [[FRAME_INDEX]], [[C1]](s64)		; ALL: [[PTR_ADD1:%[0-9]+]]:_(p0) = G_PTR_ADD [[FRAME_INDEX]], [[C1]](s64)
; ALL: [[LOAD1:%[0-9]+]]:_(s64) = G_LOAD [[PTR_ADD1]](p0) :: (dereferenceable load (s64) from %ir.5 + 8, align 4)		; ALL: [[LOAD1:%[0-9]+]]:_(s64) = G_LOAD [[PTR_ADD1]](p0) :: (dereferenceable load (s64) from %ir.5 + 8, align 4)
; ALL: $rax = COPY [[LOAD]](s64)		; ALL: $rax = COPY [[LOAD]](s64)
; ALL: $rdx = COPY [[LOAD1]](s64)		; ALL: $rdx = COPY [[LOAD1]](s64)
; ALL: RET 0, implicit $rax, implicit $rdx		; ALL: RET 0, implicit $rax, implicit $rdx
entry:		entry:
%retval = alloca %struct.i4, align 4		%retval = alloca %struct.i4, align 4
Show All 13 Lines

llvm/test/Transforms/CodeGenPrepare/RISCV/adjust-memintrin-alignment.ll

This file was added.

				; RUN: opt -mtriple=riscv32 -data-layout="e-m:e-p:32:32" -S -codegenprepare < %s \
				; RUN: \| FileCheck %s '-D#NEW_ALIGNMENT=4'
				; RUN: opt -mtriple=riscv64 -data-layout="e-m:e-p:64:64" -S -codegenprepare < %s \
				arichardsonAuthorUnsubmitted Done Reply Inline Actions Not sure if there is a way to get the default target datalayout from opt, so I've hardcoded the relevant bits here. arichardson: Not sure if there is a way to get the default target datalayout from opt, so I've hardcoded the…
				; RUN: \| FileCheck %s '-D#NEW_ALIGNMENT=8'

				@str = private unnamed_addr constant [45 x i8] c"THIS IS A LONG STRING THAT SHOULD BE ALIGNED\00", align 1


				declare void @use(ptr %arg)


				; CHECK: @[[STR:[a-zA-Z0-9_$"\\.-]+]] = private unnamed_addr constant [45 x i8] c"THIS IS A LONG STRING THAT SHOULD BE ALIGNED\00", align [[#NEW_ALIGNMENT]]

				define void @foo() {
				; CHECK-LABEL: @foo(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[DST:%.*]] = alloca [45 x i8], align [[#NEW_ALIGNMENT]]
				; CHECK-NEXT: tail call void @llvm.memcpy.p0.p0.i32(ptr align [[#NEW_ALIGNMENT]] [[DST]], ptr align [[#NEW_ALIGNMENT]] dereferenceable(31) @str, i32 31, i1 false)
				; CHECK-NEXT: ret void

				entry:
				%dst = alloca [45 x i8], align 1
				tail call void @llvm.memcpy.p0i8.p0i8.i32(ptr align 1 %dst, ptr align 1 dereferenceable(31) @str, i32 31, i1 false)
				ret void
				}

				; negative test - check that we don't align objects that are too small
				define void @no_align(ptr %src) {
				; CHECK-LABEL: @no_align(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[DST:%.*]] = alloca [3 x i8], align 1
				; CHECK-NEXT: tail call void @llvm.memcpy.p0.p0.i32(ptr align 1 [[DST]], ptr align 1 [[SRC:%.*]], i32 31, i1 false)
				; CHECK-NEXT: ret void
				;
				entry:
				%dst = alloca [3 x i8], align 1
				tail call void @llvm.memcpy.p0i8.p0i8.i32(ptr align 1 %dst, ptr %src, i32 31, i1 false)
				ret void
				}

				; negative test - check that minsize requires at least 8 byte object size
				define void @no_align_minsize(ptr %src) minsize {
				; CHECK-LABEL: @no_align_minsize(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[DST:%.*]] = alloca [7 x i8], align 1
				; CHECK-NEXT: tail call void @llvm.memcpy.p0.p0.i32(ptr align 1 [[DST]], ptr align 1 [[SRC:%.*]], i32 31, i1 false)
				; CHECK-NEXT: ret void
				;
				entry:
				%dst = alloca [7 x i8], align 1
				tail call void @llvm.memcpy.p0i8.p0i8.i32(ptr align 1 %dst, ptr %src, i32 31, i1 false)
				ret void
				}

				declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i1)

This is an archive of the discontinued LLVM Phabricator instance.

[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementationAcceptedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 478750

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/lib/CodeGen/TargetLoweringBase.cpp

llvm/lib/Target/ARM/ARMISelLowering.h

llvm/lib/Target/ARM/ARMISelLowering.cpp

llvm/test/CodeGen/RISCV/memcpy-inline.ll

llvm/test/CodeGen/WebAssembly/bulk-memory.ll

llvm/test/CodeGen/WebAssembly/bulk-memory64.ll

llvm/test/CodeGen/X86/GlobalISel/x86_64-irtranslator-struct-return.ll

llvm/test/Transforms/CodeGenPrepare/RISCV/adjust-memintrin-alignment.ll

[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementation
AcceptedPublic