This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
-
Attr.td
-
AttrDocs.td
-
Builtins.def
-
lib/
-
CodeGen/
-
CGBuilder.h
-
CGBuiltin.cpp
-
CGCall.cpp
-
Sema/
-
SemaDeclAttr.cpp
-
llvm/
-
docs/
-
LangRef.rst
-
include/llvm/IR/
-
llvm/
-
IR/
-
IRBuilder.h
-
IntrinsicInst.h
-
Intrinsics.td
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
SelectionDAGBuilder.cpp
-
IR/
-
IRBuilder.cpp
-
Target/X86/
-
X86/
-
X86SelectionDAGInfo.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
memcpy-inline.ll
-
memcpy.ll

Differential D61634

[clang/llvm] Allow efficient implementation of libc's memory functions in C/C++
AbandonedPublic

Authored by gchatelet on May 7 2019, 2:41 AM.

Download Raw Diff

Details

Reviewers

courbet
theraven
t.p.northover
jdoerfert

Summary

POC for the rfc http://lists.llvm.org/pipermail/llvm-dev/2019-April/131973.html

It's still work in progress but gives the general idea.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 33006
Build 33005: arc lint + arc unit

Event Timeline

gchatelet created this revision.May 7 2019, 2:41 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 7 2019, 2:41 AM

Herald added subscribers: llvm-commits, cfe-commits, mgrang, hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B31512: Diff 198421.May 7 2019, 2:44 AM

gchatelet edited the summary of this revision. (Show Details)May 7 2019, 2:46 AM

gchatelet added reviewers: theraven, t.p.northover.

Add documentation.
Fix permissive HasNoRuntimeAttribute

Herald added a subscriber: jdoerfert. · View Herald TranscriptMay 7 2019, 4:41 AM

Harbormaster completed remote builds in B31520: Diff 198436.May 7 2019, 4:42 AM

I wonder if a list of comma-separated names is the right approach. Would it make more sense to add a new attribute for each of the helpers, such as #no-runtime-for-memcpy? That should make querying simpler (one lookup in the table, rather than a lookup and a string scan) and also make it easier to add and remove attributes (merging is now just a matter of trying to add all of them, with the existing logic to deduplicate repeated attributes working).

We probably need to also carefully audit all of the transform passes to find ones that insert calls. Last time I looked, there were four places in LLVM where memcpy could be expanded, I wonder if there are a similar number where it can be synthesised...

In D61634#1493350, @theraven wrote:

I wonder if a list of comma-separated names is the right approach. Would it make more sense to add a new attribute for each of the helpers, such as #no-runtime-for-memcpy? That should make querying simpler (one lookup in the table, rather than a lookup and a string scan) and also make it easier to add and remove attributes (merging is now just a matter of trying to add all of them, with the existing logic to deduplicate repeated attributes working).

So I decided to go that route for two reasons:

The no_runtime_for attribute will be used almost exclusively by runtime implementers, on average lookup will return false and the parsing part should be marginal (famous last words?)
We need to support every function in TargetLibraryInfo (I count 434 of them) and I'm not sure adding one attribute per function will scale or can stay in sync.

Now I haven't thought about merging indeed, we may be able to reuse or clone the approach used for target-features?
For instance some backend disable specific functions and we may warn the user if it happens. e.g. setLibcallName(RTLIB::SHL_I128, nullptr); in X86ISelLowering.cpp

I'm not saying one attribute per helper is not feasible but I'd like to put it into perspective with other constraints.

We probably need to also carefully audit all of the transform passes to find ones that insert calls. Last time I looked, there were four places in LLVM where memcpy could be expanded, I wonder if there are a similar number where it can be synthesised...

Yes indeed, it's going to be long and painful, here are the functions calling CallLoweringInfo::setLibCallee :

llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

ExpandLibCall(LC, Node, isSigned)
ExpandChainLibCall(LC, Node, isSigned)
ExpandDivRemLibCall(Node, Results)
ExpandSinCosLibCall(Node, Results)
ConvertNodeToLibcall(Node)
ConvertNodeToLibcall(Node)

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

ExpandIntRes_XMULO(N, Lo, Hi)

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.cpp

ExpandChainLibCall(LC, Node, isSigned)

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

getMemcpy(Chain, dl, Dst, Src, Size, Align, isVol, AlwaysInline, isTailCall, DstPtrInfo, SrcPtrInfo)
getAtomicMemcpy(Chain, dl, Dst, DstAlign, Src, SrcAlign, Size, SizeTy, ElemSz, isTailCall, DstPtrInfo, SrcPtrInfo)
getMemmove(Chain, dl, Dst, Src, Size, Align, isVol, isTailCall, DstPtrInfo, SrcPtrInfo)
getAtomicMemmove(Chain, dl, Dst, DstAlign, Src, SrcAlign, Size, SizeTy, ElemSz, isTailCall, DstPtrInfo, SrcPtrInfo)
getMemset(Chain, dl, Dst, Src, Size, Align, isVol, isTailCall, DstPtrInfo)
getAtomicMemset(Chain, dl, Dst, DstAlign, Value, Size, SizeTy, ElemSz, isTailCall, DstPtrInfo)

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

visitIntrinsicCall(I, Intrinsic)

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

makeLibCall(DAG, LC, RetVT, Ops, isSigned, dl, doesNotReturn, isReturnValueUsed, isPostTypeLegalization)
LowerToTLSEmulatedModel(GA, DAG)

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

LowerFSINCOS(Op, DAG)

llvm/lib/Target/AArch64/AArch64SelectionDAGInfo.cpp

EmitTargetCodeForMemset(DAG, dl, Chain, Dst, Src, Size, Align, isVolatile, DstPtrInfo)

llvm/lib/Target/ARM/ARMISelLowering.cpp

LowerToTLSGeneralDynamicModel(GA, DAG)

llvm/lib/Target/ARM/ARMSelectionDAGInfo.cpp

EmitSpecializedLibcall(DAG, dl, Chain, Dst, Src, Size, Align, LC)

llvm/lib/Target/Hexagon/HexagonSelectionDAGInfo.cpp

EmitTargetCodeForMemcpy(DAG, dl, Chain, Dst, Src, Size, Align, isVolatile, AlwaysInline, DstPtrInfo, SrcPtrInfo)

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

LowerINIT_TRAMPOLINE(Op, DAG)

llvm/lib/Target/X86/X86ISelLowering.cpp

LowerWin64_i128OP(Op, DAG)
LowerFSINCOS(Op, Subtarget, DAG)

llvm/lib/Target/X86/X86SelectionDAGInfo.cpp

EmitTargetCodeForMemset(DAG, dl, Chain, Dst, Val, Size, Align, isVolatile, DstPtrInfo)

I would be careful about trying to over-generalize here. There are a few different related bits of functionality which seem to be interesting, given the discussion in the llvm-dev thread, here, and in related patches:

The ability to specify -fno-builtin* on a per-function level, using function attributes.
Improved optimization when -fno-builtin-memcpy is specified.
The ability to avoid calls to memcpy for certain C constructs which would naturally be lowered to a memcpy call, like struct assignment of large structs, or explicit calls to __builtin_memcpy(). Maybe also some generalization of this involving other libc/libm/compiler-rt calls.
The ability to force the compiler to generate "rep; movs" on x86 without inline asm.

It's not clear to me that all of this should be tied together. In particular, I'm not sure -fno-builtin-memcpy should imply the compiler never generates a call to memcpy(). On recent x86 chips, you might be able to get away with unconditionally using "rep movs", but generally an efficient memcpy for more than a few bytes is a lot longer than one instruction, and is not something reasonable for the compiler to synthesize inline.

If we're adding new IR attributes here, we should also consider the interaction with LTO.

I'm not convinced by the approach.

Can it still recognize the loop idiom into memcpy implementation but use memmove (as only memcpy has been blacklisted)?
Can it do the same for memmove (for the case without overlap), and end-up with infinite recursion?

I have a feeling we should pick a stance:

either not allow the compiler to lower a builtin to a call to the library; (my preferred choice, but I'm biased)
or not allow the library to use compiler builtins (but LTO flow with the runtime library *already* linked smells like trouble if we go this way).

The reason for my bias is that I have a multi-memcpy codegen in the compiler to generate those two calls:

memcpy(left,  in, n);
memcpy(right, in, n);

with a single loop.

In D61634#1493927, @efriedma wrote:

I would be careful about trying to over-generalize here. There are a few different related bits of functionality which seem to be interesting, given the discussion in the llvm-dev thread, here, and in related patches:

Thx for the feedback @efriedma, I don't fully understand what you're suggesting here so I will try to reply inline.

The ability to specify -fno-builtin* on a per-function level, using function attributes.

-fno-builtin* is about preventing clang/llvm from recognizing that a piece of code has the same semantic as a particular IR intrinsic, it has nothing to do with preventing the compiler from generating runtime calls.

fno-builtin is about transformation from code to IR (frontend)
The RFC is about the transformation from IR to runtime calls (backend)

Improved optimization when -fno-builtin-memcpy is specified.

I don't see this happening because if -fno-builtin-memcpyis used, clang (frontend) might already have unrolled and vectorized the loop, It is then very hard - by simply looking at the IR - to recognize that it's a memcpy and generate good code (e.g. https://godbolt.org/z/JZ-mR0)
Here we really want the compiler to understand that we are copying memory (i.e. this is really @llvm.memcpy semantic) but we want to prevent it from calling the runtime.

The ability to avoid calls to memcpy for certain C constructs which would naturally be lowered to a memcpy call, like struct assignment of large structs, or explicit calls to __builtin_memcpy(). Maybe also some generalization of this involving other libc/libm/compiler-rt calls.

I believe very few people will use the attribute described in the RFC, it will most probably be library maintainers that already know a good deal of how the compiler is allowed to transform the code.

The ability to force the compiler to generate "rep; movs" on x86 without inline asm.

This is not strictly required - at least this is not too useful from the purpose of building memcpy functions (more on this a few lines below).

It's not clear to me that all of this should be tied together. In particular, I'm not sure -fno-builtin-memcpy should imply the compiler never generates a call to memcpy().

As a matter of fact, those are not tied together. There are different use cases with different solutions, the one I'm focusing on here is about preventing the compiler from synthesizing runtime calls because we want to be able to implement them directly from C / C++.
It is orthogonal to having the compiler recognize a piece of code as an IR intrinsic.

On recent x86 chips, you might be able to get away with unconditionally using "rep movs", but generally an efficient memcpy for more than a few bytes is a lot longer than one instruction, and is not something reasonable for the compiler to synthesize inline.

Well it depends. On Haswell and particularly Skylake it's hard to beat rep;movsb for anything bigger than 1k, be it aligned or not.
On other architectures and especially on the ones without ERMSB you have different strategies. Actually this is the very goal of this RFC: if you can inline or use PGO you can do a much better job for small sizes than calling libc's memcpy or inserting rep;movsb.

If we're adding new IR attributes here, we should also consider the interaction with LTO.

Yes this is a very different story, that's why I'm not exploring this route. It's rather possible that it would come with a high maintenance cost as well.

In D61634#1494518, @alexandre.isoard wrote:

I'm not convinced by the approach.

Can it still recognize the loop idiom into memcpy implementation but use memmove (as only memcpy has been blacklisted)?

Yes it can and it's fine, passing src and dst arguments as __restrict will ensure that memcpy is the function to choose in this case.
This attribute is not to be used widely but is directed towards library maintainer that already know what the compiler is allowed to do.

Can it do the same for memmove (for the case without overlap), and end-up with infinite recursion?

I fail to see how this would happen. Could you craft some pseudo code that would exhibit the infinite recursion?
My take on it is that if the compiler can't generate the code it's totally OK to fail with an error message.

The purpose of this RFC it to get some help from the compiler to generate the best code for some building blocks (say copy 4 bytes, copy 8 bytes) and assemble them in a way that maximizes something (code size, runtime for certain parameter distributions).
It would be useful only to libc / libm / compiler-rt implementers to build these functions on top of smaller functions that we know the compiler can produce at compile time.

I don't think we want to use this attribute to generate fully the memcpy. I added the expansion into a rep;movsb for variable sized memcpy only because it's feasible on x86. It's preferable in this case since it has the correct semantic and prevents a compilation error but I would be fine with a compilation error here.

I have a feeling we should pick a stance:

either not allow the compiler to lower a builtin to a call to the library; (my preferred choice, but I'm biased)

From the point of view of LLVM @llvm.memcpy intrinsics has the semantic of memcpy it does not know if it comes from a lowering in the frontend (e.g. passing structures by value) or from a built in.
I believe this is feasible although I fail to see where my proposal falls short. Can you show a code example where the current proposal is problematic?

or not allow the library to use compiler builtins (but LTO flow with the runtime library *already* linked smells like trouble if we go this way).

This is currently generating very poor code because -fno-builtin prevents LLVM from understanding the copy semantic.

The reason for my bias is that I have a multi-memcpy codegen in the compiler to generate those two calls:
memcpy(left,  in, n);
memcpy(right, in, n);
with a single loop.

I don't quite understand how this is linked to the issue at hand. Can you provide more context? Pointers to code?

-fno-builtin* is about preventing clang/llvm from recognizing that a piece of code has the same semantic as a particular IR intrinsic, it has nothing to do with preventing the compiler from generating runtime calls.

It has a dual purpose for C library functions. One, it prevents the compiler from assuming an explicitly written call to that function has some particular semantics. Two, it prevents the compiler from assuming the underlying library function exists for the purpose of optimization. (These are sort of intertwined, but they both matter in this context.)

I believe very few people will use the attribute described in the RFC, it will most probably be library maintainers that already know a good deal of how the compiler is allowed to transform the code.

Sure, I'm happy to assume that memcpy/memset implementations are written using some appropriate subset of C. (We should probably document that subset at some point.)

I still think there are really two things you're trying to accomplish here, which should be handled separately.

Add a function attribute that works like -fno-builtin-memcpy currently does, to prevent the compiler from synthesizing calls to memcpy.
Provide a convenient way to write a small, fixed-length "memcpy", in C and in LLVM IR, that is always expanded to optimal straight-line code (without any calls or loops).

These correspond to proposals (1) and (2) from your RFC; I think we need both to arrive at a reasonable final state.

(The "rep; movs" is a third thing, which I think should also be handled separately, but it sounds like you don't think that's very important.)

In D61634#1498376, @efriedma wrote:

I still think there are really two things you're trying to accomplish here, which should be handled separately.

Add a function attribute that works like -fno-builtin-memcpy currently does, to prevent the compiler from synthesizing calls to memcpy.

Provide a convenient way to write a small, fixed-length "memcpy", in C and in LLVM IR, that is always expanded to optimal straight-line code (without any calls or loops).

These correspond to proposals (1) and (2) from your RFC; I think we need both to arrive at a reasonable final state.

(The "rep; movs" is a third thing, which I think should also be handled separately, but it sounds like you don't think that's very important.)

Thank you for taking the time to comment, your feedback is highly appreciated.

I understand that your main concern is about conflating two orthogonal requirements (namely 1. and 2.) in a single attribute. Is that correct?
From my point of view, the RFC (and this Patch) really is about 1. - because 2. can already be achieved with builtins __builtin_memcpy(dst, src, <compile time size>).

My secondary goals in decreasing priority order are:

do not change the semantic of current builtins,
create a relatively straightforward LLVM patch that does not touch too much code and that is easy review,
do not add more confusion around -fno-builtin-* meaning,
do not add more builtins or IR intrinsics.

What is the main blocker on your end? What would you suggest so we can move forward?

My main blocker is that I want to make sure we're moving in the right direction: towards LLVM IR with clear semantics, towards straightforward rules for writing freestanding C code, and towards solutions which behave appropriately for all targets. There's clearly a problem here that's worth solving, but I want to make sure we're adding small features that interact with existing features in an obvious way. The patch as written is adding a new attribute that changes the semantics of C and LLVM IR in a subtle way that interacts with existing optimizations and lowering in ways that are complex and hard to understand.

I don't want to mix general restrictions on calling C library functions, with restrictions that apply specifically to memcpy: clang generally works on the assumption that a "memcpy" symbol is available in freestanding environments, and we don't really want to change that.

With -fno-builtin, specifically, we currently apply the restriction that optimizations will not introduce memcpy calls that would not exist with optimization disabled. This is sort of artificial, and and maybe a bit confusing, but it seems to work well enough in practice. gcc does something similar.

I don't really want to add an attribute that has a different meaning from -fno-builtin. An attribute that has the same meaning is a lot less problematic: it reduces potential confusion, and solves a related problem that -fno-builtin currently doesn't really work with LTO, because we don't encode it into the IR.

Your current patch is using the "AlwaysInline" flag for SelectionDAG::getMemcpy, which forces a memcpy to be lowered to an inline implementation. In general, this flag is only supported for constant-size memcpy calls; otherwise, on targets where EmitTargetCodeForMemcpy does not have some special lowering, like the x86 "rep;movs", you'll hit an assertion failure. And we don't really want to add an implementation of variable-length memcpy for a lot of targets; it's very inefficient on targets which don't have unaligned memory access. You could try to narrowly fix this to only apply "AlwaysInline" if the size is a constant integer, but there's a related problem: even if a memcpy is constant size in C, optimizations don't promise to preserve that, in general. We could theoretically add such a promise, I guess, but that's awkward: it would conditionally apply to llvm.memcpy where the parameter is already const, so it's not something we can easily verify.

If we added a new builtin llvm.memcpy.inline, or something like that, the end result ends up being simpler in many ways. It has obvious rules for both generating it and lowering it, which don't depend on attributes in the parent function. We could mark the size as "immarg", so we don't have to add special optimization rules for it. And if we have it, we have a "complete" solution for writing memcpy implementations in C: we make __builtin_memcpy, or a new __builtin_inline_memcpy, produce it, and it can be combined with -fno-builtin to ensure we don't produce any calls to the "real" memcpy. The downside of a new builtin, of course, is that we now have two functions with similar semantics, and potentially have to fix a bunch of places to check for both of them.

MemCpyOpt has been mentioned (the pass which transforms memcpy-like loops into llvm.memcpy). If we want it to perform some transform in circumstances where the call "memcpy" isn't available, we have to make sure to restrict it based on the target: in the worst case, on some targets without unaligned memory access, it could just act as a low-quality loop unroller. This applies no matter how the result is actually represented in IR.

In D61634#1500453, @efriedma wrote:

My main blocker is that I want to make sure we're moving in the right direction: towards LLVM IR with clear semantics, towards straightforward rules for writing freestanding C code, and towards solutions which behave appropriately for all targets. There's clearly a problem here that's worth solving, but I want to make sure we're adding small features that interact with existing features in an obvious way. The patch as written is adding a new attribute that changes the semantics of C and LLVM IR in a subtle way that interacts with existing optimizations and lowering in ways that are complex and hard to understand.

This makes a lot of sense, I'm totally on board to reduce entropy and confusion along the way.

I don't want to mix general restrictions on calling C library functions, with restrictions that apply specifically to memcpy: clang generally works on the assumption that a "memcpy" symbol is available in freestanding environments, and we don't really want to change that.

With -fno-builtin, specifically, we currently apply the restriction that optimizations will not introduce memcpy calls that would not exist with optimization disabled. This is sort of artificial, and and maybe a bit confusing, but it seems to work well enough in practice. gcc does something similar.

I don't really want to add an attribute that has a different meaning from -fno-builtin. An attribute that has the same meaning is a lot less problematic: it reduces potential confusion, and solves a related problem that -fno-builtin currently doesn't really work with LTO, because we don't encode it into the IR.

Adding @tejohnson to the conversation.

IIUC you're offering to introduce something like __attribute__((no-builtin-memcpy)) or __attribute__((no-builtin("memcpy"))).
As attributes they would compose nicely with (Thin)LTO.

I believe we still want to turn -fno-builtin flags into attributes so there's no impedance mismatch between the flag and the attribute right?

Your current patch is using the "AlwaysInline" flag for SelectionDAG::getMemcpy, which forces a memcpy to be lowered to an inline implementation. In general, this flag is only supported for constant-size memcpy calls; otherwise, on targets where EmitTargetCodeForMemcpy does not have some special lowering, like the x86 "rep;movs", you'll hit an assertion failure. And we don't really want to add an implementation of variable-length memcpy for a lot of targets; it's very inefficient on targets which don't have unaligned memory access. You could try to narrowly fix this to only apply "AlwaysInline" if the size is a constant integer, but there's a related problem: even if a memcpy is constant size in C, optimizations don't promise to preserve that, in general. We could theoretically add such a promise, I guess, but that's awkward: it would conditionally apply to llvm.memcpy where the parameter is already const, so it's not something we can easily verify.

Fair enough. This patch was really to get the conversation started : I was myself not especially convinced about the approach. Hijacking the AlwaysInline parameter was a shortcut that would not work for other mem function anyways.

If we added a new builtin llvm.memcpy.inline, or something like that, the end result ends up being simpler in many ways. It has obvious rules for both generating it and lowering it, which don't depend on attributes in the parent function. We could mark the size as "immarg", so we don't have to add special optimization rules for it. And if we have it, we have a "complete" solution for writing memcpy implementations in C: we make __builtin_memcpy, or a new __builtin_inline_memcpy, produce it, and it can be combined with -fno-builtin to ensure we don't produce any calls to the "real" memcpy. The downside of a new builtin, of course, is that we now have two functions with similar semantics, and potentially have to fix a bunch of places to check for both of them.

This was one of the approaches I envisioned, it's much cleaner and it's also a lot more work : )
I'm willing to go that route knowing that it would also work for @theraven's use case.

MemCpyOpt has been mentioned (the pass which transforms memcpy-like loops into llvm.memcpy). If we want it to perform some transform in circumstances where the call "memcpy" isn't available, we have to make sure to restrict it based on the target: in the worst case, on some targets without unaligned memory access, it could just act as a low-quality loop unroller. This applies no matter how the result is actually represented in IR.

Yes if we have to generate loops it need to happen before SelectionDAG.

In this framework -ffreestanding stays as it is - namely it implies -fno-builtin. I still think that the semantic is somewhat surprising and unclear to the newcomer but I guess we can't do anything about it at this point - apart adding more documentation.

Lastly, if we are to introduce new IR intrinsics, how about adding some for memcmp and bcmp? It does not have to be part of this patch but I think it's worth considering for consistency.

In D61634#1501066, @gchatelet wrote:

In D61634#1500453, @efriedma wrote:

My main blocker is that I want to make sure we're moving in the right direction: towards LLVM IR with clear semantics, towards straightforward rules for writing freestanding C code, and towards solutions which behave appropriately for all targets. There's clearly a problem here that's worth solving, but I want to make sure we're adding small features that interact with existing features in an obvious way. The patch as written is adding a new attribute that changes the semantics of C and LLVM IR in a subtle way that interacts with existing optimizations and lowering in ways that are complex and hard to understand.

This makes a lot of sense, I'm totally on board to reduce entropy and confusion along the way.

I don't want to mix general restrictions on calling C library functions, with restrictions that apply specifically to memcpy: clang generally works on the assumption that a "memcpy" symbol is available in freestanding environments, and we don't really want to change that.

With -fno-builtin, specifically, we currently apply the restriction that optimizations will not introduce memcpy calls that would not exist with optimization disabled. This is sort of artificial, and and maybe a bit confusing, but it seems to work well enough in practice. gcc does something similar.

I don't really want to add an attribute that has a different meaning from -fno-builtin. An attribute that has the same meaning is a lot less problematic: it reduces potential confusion, and solves a related problem that -fno-builtin currently doesn't really work with LTO, because we don't encode it into the IR.

Adding @tejohnson to the conversation.

IIUC you're offering to introduce something like __attribute__((no-builtin-memcpy)) or __attribute__((no-builtin("memcpy"))).
As attributes they would compose nicely with (Thin)LTO.

I believe we still want to turn -fno-builtin flags into attributes so there's no impedance mismatch between the flag and the attribute right?

I have a related patch that turns -fno-builtin* options into module flags, and uses them in the LTO backends. This addresses current issues with these flags not working well with LTO.
See https://reviews.llvm.org/D60162

I have a related patch that turns -fno-builtin* options into module flags

Do you have any opinion on representing -fno-builtin using a module flag vs. a function attribute in IR? It seems generally more flexible and easier to reason about a function attribute from my perspective. But I might be missing something about the semantics of -fno-builtin that would make that representation awkward. Or I guess it might just be more work to implement, given we have some IPO passes that use TargetLibraryInfo?

In D61634#1502043, @efriedma wrote:

I have a related patch that turns -fno-builtin* options into module flags

Do you have any opinion on representing -fno-builtin using a module flag vs. a function attribute in IR? It seems generally more flexible and easier to reason about a function attribute from my perspective. But I might be missing something about the semantics of -fno-builtin that would make that representation awkward. Or I guess it might just be more work to implement, given we have some IPO passes that use TargetLibraryInfo?

I think that a function attribute would be better. We generally use these flags only in the context of certain translation units, and when we use LTO, it would be sad if we had to take the most-conservative settings across the entire application. When we insert new function call to a standard library, we always do it in the context of some function. We'd probably need to block inlining in some cases, but that's better than a global conservative setting.

In D61634#1502138, @hfinkel wrote:

In D61634#1502043, @efriedma wrote:

I have a related patch that turns -fno-builtin* options into module flags

Do you have any opinion on representing -fno-builtin using a module flag vs. a function attribute in IR? It seems generally more flexible and easier to reason about a function attribute from my perspective. But I might be missing something about the semantics of -fno-builtin that would make that representation awkward. Or I guess it might just be more work to implement, given we have some IPO passes that use TargetLibraryInfo?

I think that a function attribute would be better. We generally use these flags only in the context of certain translation units, and when we use LTO, it would be sad if we had to take the most-conservative settings across the entire application. When we insert new function call to a standard library, we always do it in the context of some function. We'd probably need to block inlining in some cases, but that's better than a global conservative setting.

Using function level attributes instead of module flags does provide finer grained control and avoids the conservativeness when merging IR for LTO. The downsides I see, mostly just in terms of the engineering effort to get this to work, are:

need to prevent inlining with different attributes
currently the TargetLibraryInfo is constructed on a per-module basis. Presumably it would instead need to be created per Function - this one in particular seems like it would require fairly extensive changes.

tschuett added a subscriber: tschuett.May 14 2019, 11:45 PM

In D61634#1502201, @tejohnson wrote:

Using function level attributes instead of module flags does provide finer grained control and avoids the conservativeness when merging IR for LTO. The downsides I see, mostly just in terms of the engineering effort to get this to work, are:

need to prevent inlining with different attributes

IIUC this is needed regardless of the proposed change. Correct?

currently the TargetLibraryInfo is constructed on a per-module basis. Presumably it would instead need to be created per Function - this one in particular seems like it would require fairly extensive changes.

Yes this one is a bit worrying.
I think we can discard right away any solution that would mutate or create a TLI on a per function basis.
Another design would be something like the following:

auto FunctionTLI = ModuleTLI.createCustomizedTLI(F);

where FunctionTLI is itself a TargetLibraryInfo and calls to FunctionTLI would either return the function customizations or delegate to the module's TLI. WDYT?

I'm unsure if we want to support function level attribute right away or if it's OK to be in an intermediate state with only module level attributes.

In D61634#1502685, @gchatelet wrote:

In D61634#1502201, @tejohnson wrote:

Using function level attributes instead of module flags does provide finer grained control and avoids the conservativeness when merging IR for LTO. The downsides I see, mostly just in terms of the engineering effort to get this to work, are:

need to prevent inlining with different attributes

IIUC this is needed regardless of the proposed change. Correct?

With the module flags approach, no - because there won't be fine grained enough info to do so. Any merged IR will need to use the conservatively set merged flag for the whole module. Or did you mean in comparison with the approach in this patch? I haven't looked at this one in any detail yet.

currently the TargetLibraryInfo is constructed on a per-module basis. Presumably it would instead need to be created per Function - this one in particular seems like it would require fairly extensive changes.

Yes this one is a bit worrying.
I think we can discard right away any solution that would mutate or create a TLI on a per function basis.
Another design would be something like the following:
auto FunctionTLI = ModuleTLI.createCustomizedTLI(F);
where FunctionTLI is itself a TargetLibraryInfo and calls to FunctionTLI would either return the function customizations or delegate to the module's TLI. WDYT?

I don't think this makes it much easier - all TLI users still need to be taught to get and use this new Function level TLI. I guess for Function (or lower) passes it would be fairly straightforward, but for things like Module level or SCC passes it will require more wiring to ensure that the FunctionTLI is accessed in the appropriate places. Doable just a lot of manual changes.

I'm unsure if we want to support function level attribute right away or if it's OK to be in an intermediate state with only module level attributes.

I'm interested in thoughts from other developers here. The module flag change is straightforward, but unnecessary churn if we want to go the function attribute route. Which despite the work seems like the best long term approach...

In D61634#1502201, @tejohnson wrote:

In D61634#1502138, @hfinkel wrote:

In D61634#1502043, @efriedma wrote:

I have a related patch that turns -fno-builtin* options into module flags

Do you have any opinion on representing -fno-builtin using a module flag vs. a function attribute in IR? It seems generally more flexible and easier to reason about a function attribute from my perspective. But I might be missing something about the semantics of -fno-builtin that would make that representation awkward. Or I guess it might just be more work to implement, given we have some IPO passes that use TargetLibraryInfo?

I think that a function attribute would be better. We generally use these flags only in the context of certain translation units, and when we use LTO, it would be sad if we had to take the most-conservative settings across the entire application. When we insert new function call to a standard library, we always do it in the context of some function. We'd probably need to block inlining in some cases, but that's better than a global conservative setting.

Using function level attributes instead of module flags does provide finer grained control and avoids the conservativeness when merging IR for LTO. The downsides I see, mostly just in terms of the engineering effort to get this to work, are:

need to prevent inlining with different attributes

I think that this should be relatively straightforward. You just need to update AttributeFuncs::areInlineCompatible and/or AttributeFuncs::mergeAttributesForInlining by adding a new MergeRule in include/llvm/IR/Attributes.td and writing a function like adjustCallerStackProbeSize.

currently the TargetLibraryInfo is constructed on a per-module basis. Presumably it would instead need to be created per Function - this one in particular seems like it would require fairly extensive changes.

Interesting point. The largest issue I see is that we need TLI available from loop passes, etc., and we can't automatically recompute a function-level analysis there. We need to make sure that it's always available and not invalidated. TLI is one of those analysis passes, being derived only from things which don't change (i.e., the target triple), or things that change very rarely (e.g., function attributes), that we probably don't want to require all passes to explicitly say that they preserve it (not that the mechanical change to all existing passes is hard, but it's easy to forget), so I think we'd like something like the CFG-only concept in the current passes, but stronger and perhaps turned on by default, for this kind of "attributes-only" pass. (@chandlerc , thoughts on this?).

Sorry I've been a bit slow to respond here...

In D61634#1503089, @hfinkel wrote:

In D61634#1502201, @tejohnson wrote:

In D61634#1502138, @hfinkel wrote:

In D61634#1502043, @efriedma wrote:

I have a related patch that turns -fno-builtin* options into module flags

Do you have any opinion on representing -fno-builtin using a module flag vs. a function attribute in IR? It seems generally more flexible and easier to reason about a function attribute from my perspective. But I might be missing something about the semantics of -fno-builtin that would make that representation awkward. Or I guess it might just be more work to implement, given we have some IPO passes that use TargetLibraryInfo?

I think that a function attribute would be better. We generally use these flags only in the context of certain translation units, and when we use LTO, it would be sad if we had to take the most-conservative settings across the entire application. When we insert new function call to a standard library, we always do it in the context of some function. We'd probably need to block inlining in some cases, but that's better than a global conservative setting.

FWIW, I definitely agree here. This really is the end state we're going to find ourselves in and we should probably go directly there.

Using function level attributes instead of module flags does provide finer grained control and avoids the conservativeness when merging IR for LTO. The downsides I see, mostly just in terms of the engineering effort to get this to work, are:

need to prevent inlining with different attributes

I think that this should be relatively straightforward. You just need to update AttributeFuncs::areInlineCompatible and/or AttributeFuncs::mergeAttributesForInlining by adding a new MergeRule in include/llvm/IR/Attributes.td and writing a function like adjustCallerStackProbeSize.

currently the TargetLibraryInfo is constructed on a per-module basis. Presumably it would instead need to be created per Function - this one in particular seems like it would require fairly extensive changes.

Interesting point. The largest issue I see is that we need TLI available from loop passes, etc., and we can't automatically recompute a function-level analysis there. We need to make sure that it's always available and not invalidated. TLI is one of those analysis passes, being derived only from things which don't change (i.e., the target triple), or things that change very rarely (e.g., function attributes), that we probably don't want to require all passes to explicitly say that they preserve it (not that the mechanical change to all existing passes is hard, but it's easy to forget), so I think we'd like something like the CFG-only concept in the current passes, but stronger and perhaps turned on by default, for this kind of "attributes-only" pass. (@chandlerc , thoughts on this?).

Yep, this makes sense.

The new PM makes this quite easy. The analysis itself gets to implement the invalidation hook, and say "nope, I'm not invalidated". In fact, in the new PM, TargetLibraryInfo already works this way. We build an instance per function and say it is never invalidated.

However, they are just trivial wrappers around shared implementations, so it will still require some non-trivial changes. Will need to remove the module-based access and move clients over to provide a function when they query it, etc.

IIRC, TargetTransformInfo already basically works exactly this way in both old and new PMs and we should be able to look at exactly the techniques it uses in both pass managers to build an effective way to manage these per-function.

AFAIU here is a coarse plan of what needs to happen

Add a no-builtin clang function attribute that has the same semantic as the [no-builtin cmd line argument](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fbuiltin)
Propagate it to the IR.
- In the light of recent discussions and as @theraven suggested it seems more appropriate to encode them as individual IR attributes (e.g. "no-builtin-memcpy", "no-builtin-sqrt", etc..)
Propagate/merge the no-builtin IR attribute for LTO by "updating AttributeFuncs::areInlineCompatible and/or AttributeFuncs::mergeAttributesForInlining and adding a new MergeRule in include/llvm/IR/Attributes.td and writing a function like adjustCallerStackProbeSize."
Get inspiration from TargetTransformInfo to get TargetLibraryInfo on a per function basis for all passes and respect the IR attribute.

I'm not familiar with 3 and 4 but I can definitely have a look.
I'll update this patch to do 1 and 2 in the meantime. @tejohnson let me know how you want to proceed for your related patch. I'm happy to help if I can.

Use no-builtin instead of no-runtime-for.
Use one attribute per runtime function to make merging easier.

The patch is still WIP and needs more work.

Harbormaster completed remote builds in B32396: Diff 200998.May 23 2019, 9:20 AM

In D61634#1512020, @gchatelet wrote:

AFAIU here is a coarse plan of what needs to happen

Add a no-builtin clang function attribute that has the same semantic as the [no-builtin cmd line argument](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fbuiltin)

Propagate it to the IR.

In the light of recent discussions and as @theraven suggested it seems more appropriate to encode them as individual IR attributes (e.g. "no-builtin-memcpy", "no-builtin-sqrt", etc..)

Propagate/merge the no-builtin IR attribute for LTO by "updating AttributeFuncs::areInlineCompatible and/or AttributeFuncs::mergeAttributesForInlining and adding a new MergeRule in include/llvm/IR/Attributes.td and writing a function like adjustCallerStackProbeSize."

This one isn't about LTO, but rather the inliner. You could have different functions in the same module even without LTO that have incompatible no-builtin attributes. There isn't any propagation required for LTO.

Get inspiration from TargetTransformInfo to get TargetLibraryInfo on a per function basis for all passes and respect the IR attribute.

I'm not familiar with 3 and 4 but I can definitely have a look.
I'll update this patch to do 1 and 2 in the meantime. @tejohnson let me know how you want to proceed for your related patch. I'm happy to help if I can.

I will mark that one obsolete. I can work on 4, it may just take some time to get it all plumbed.

tejohnson mentioned this in D60162: [ThinLTO] Add module flags for TargetLibraryInfoImpl and use in LTO backends.May 23 2019, 10:00 PM

I have a question about qsort.. If we provide own implementation of qsort and replace calls to libc's qsort to our qsort, we could fully inline cmp function then. Ideas?

In D61634#1517228, @xbolva00 wrote:

I have a question about qsort.. If we provide own implementation of qsort and replace calls to libc's qsort to our qsort, we could fully inline cmp function then. Ideas?

qsort would seem out of scope here since this is mostly about string.h style functions, more specifically memcpy/memmove. Things like memset would also fall under this category if covered in subsequent diffs.

More specifically it's functions that can benefit a lot from backend specific features (NEON, SSE, etc) especially for larger operations. Such functions are also tricky since their performance is subject to alignment in a lot of cases (usually for SIMD operations and SIMD operations themselves may be undesirable in small cases). At the same time they are extremely important and even with ifunc or alikes, most standard libraries would not match a highly optimized version for the target that doesn't require linking (and thus branches), where decisions about which path to take (SIMD or no SIMD; SSE or AVX or MOV REP) can be determined at compilation time, eliminating the need for a lot of runtime alignment checks (more branches), which are used by standard library to determine whether a SIMD extension may be used and if so, if stack alignment is appropriate and if not, correct it. Eliminating that logic where possible would be desirable.

(I hope my answer wasn't totally off-base)

Add documentation.
Fix permissive HasNoRuntimeAttribute
Mark interleave as disabled in the documentation.
Use no-builtin instead of no-runtime-for.
Adding an llvm.memcpy.inline intrinsic.
Adding __builtin_memcpy_inline clang builtin.

Harbormaster completed remote builds in B33006: Diff 203386.Jun 6 2019, 9:21 AM

In D61634#1515176, @tejohnson wrote:

In D61634#1512020, @gchatelet wrote:

AFAIU here is a coarse plan of what needs to happen

Add a no-builtin clang function attribute that has the same semantic as the [no-builtin cmd line argument](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fbuiltin)

Propagate it to the IR.

In the light of recent discussions and as @theraven suggested it seems more appropriate to encode them as individual IR attributes (e.g. "no-builtin-memcpy", "no-builtin-sqrt", etc..)

Propagate/merge the no-builtin IR attribute for LTO by "updating AttributeFuncs::areInlineCompatible and/or AttributeFuncs::mergeAttributesForInlining and adding a new MergeRule in include/llvm/IR/Attributes.td and writing a function like adjustCallerStackProbeSize."

This one isn't about LTO, but rather the inliner. You could have different functions in the same module even without LTO that have incompatible no-builtin attributes. There isn't any propagation required for LTO.

Get inspiration from TargetTransformInfo to get TargetLibraryInfo on a per function basis for all passes and respect the IR attribute.

I'm not familiar with 3 and 4 but I can definitely have a look.
I'll update this patch to do 1 and 2 in the meantime. @tejohnson let me know how you want to proceed for your related patch. I'm happy to help if I can.

I will mark that one obsolete. I can work on 4, it may just take some time to get it all plumbed.

Checking in to see where we are on this issue. I haven't had any time to work on 4 but hopefully can start on that soon. But it needs the first part done to be effective.

In D61634#1586047, @tejohnson wrote:

Checking in to see where we are on this issue. I haven't had any time to work on 4 but hopefully can start on that soon. But it needs the first part done to be effective.

Thx for the heads up @tejohnson.
The patch is missing a bit of documentation but it shouldn't be too far from complete:

it adds a clang function attribute, e.g. __attribute__((no_builtin("memcpy")))
instructs clang to compose individual IR attributes from the clang attribute above, e.g. no-builtin-memcpy, no-builtin-memset, no-builtin-sqrt...
adds a specific builtin to clang for the memcpy case __builtin_memcpy_inline
adds an LLVM IR intrinsic int_memcpy_inline
adds an LLVM Instruction MemCpyInlineInst
instructs LLVM to forward the no-builtin-memcpy IR attribute from the function declaration to the actual memcpy calls inside the function's body (same for memset and memmove)
adds code to turn the MemCpyInlineInst into code, using DAG.getMemcpy with always_inline set.

Left to do:

finish implementing memset / memmove
check attributes and reject the one that are not implemented
add documentation

There's too many things going on in this patch and it may worth splitting it.

tejohnson mentioned this in D66428: Change TargetLibraryInfo analysis passes to always require Function.Aug 19 2019, 10:14 AM

I had some time to work on this finally late last week. I decided the most straightforward thing was to implement the necessary interface changes to the TLI analysis to make it require a Function (without any changes yet to how that analysis operates). See D66428 that I just mailed for review. That takes care of the most widespread changes needed for this migration, and afterwards we can change the analysis to look at the function attributes and make a truly per-function TLI.

Herald added a reviewer: jdoerfert. · View Herald TranscriptAug 19 2019, 10:17 AM

tejohnson mentioned this in rL371284: Change TargetLibraryInfo analysis passes to always require Function.Sep 6 2019, 8:09 PM

tejohnson mentioned this in rG9c27b59cec76: Change TargetLibraryInfo analysis passes to always require Function.

tejohnson mentioned this in D67923: [TLI] Support for per-Function TLI that overrides available libfuncs.Sep 23 2019, 8:53 AM

In D61634#1635595, @tejohnson wrote:

I had some time to work on this finally late last week. I decided the most straightforward thing was to implement the necessary interface changes to the TLI analysis to make it require a Function (without any changes yet to how that analysis operates). See D66428 that I just mailed for review. That takes care of the most widespread changes needed for this migration, and afterwards we can change the analysis to look at the function attributes and make a truly per-function TLI.

D66428 went in a few weeks ago at r371284, and I just mailed the follow on patch D67923 which will adds the support into the TLI analysis to use the Function to override the available builtins (with some of the code stubbed out since we don't yet have those per-Function attributes finalized).

@gchatelet where are you at on finalizing this patch? Also, I mentioned this offline but to follow up here: I think we will want an attribute to represent -fno-builtins (so that it doesn't need to be expanded out into the full list of individual no-builtin-{func} attributes, which would be both more verbose and less efficient, as well as being less backward compatible when new builtin funcs are added).

In D61634#1679331, @tejohnson wrote:

In D61634#1635595, @tejohnson wrote:

I had some time to work on this finally late last week. I decided the most straightforward thing was to implement the necessary interface changes to the TLI analysis to make it require a Function (without any changes yet to how that analysis operates). See D66428 that I just mailed for review. That takes care of the most widespread changes needed for this migration, and afterwards we can change the analysis to look at the function attributes and make a truly per-function TLI.

D66428 went in a few weeks ago at r371284, and I just mailed the follow on patch D67923 which will adds the support into the TLI analysis to use the Function to override the available builtins (with some of the code stubbed out since we don't yet have those per-Function attributes finalized).

@gchatelet where are you at on finalizing this patch? Also, I mentioned this offline but to follow up here: I think we will want an attribute to represent -fno-builtins (so that it doesn't need to be expanded out into the full list of individual no-builtin-{func} attributes, which would be both more verbose and less efficient, as well as being less backward compatible when new builtin funcs are added).

I'll break this patch in several pieces. The first one is to add the no_builtin attribute, see https://reviews.llvm.org/D61634.

In D61634#1682660, @gchatelet wrote:

In D61634#1679331, @tejohnson wrote:

In D61634#1635595, @tejohnson wrote:

I had some time to work on this finally late last week. I decided the most straightforward thing was to implement the necessary interface changes to the TLI analysis to make it require a Function (without any changes yet to how that analysis operates). See D66428 that I just mailed for review. That takes care of the most widespread changes needed for this migration, and afterwards we can change the analysis to look at the function attributes and make a truly per-function TLI.

D66428 went in a few weeks ago at r371284, and I just mailed the follow on patch D67923 which will adds the support into the TLI analysis to use the Function to override the available builtins (with some of the code stubbed out since we don't yet have those per-Function attributes finalized).

@gchatelet where are you at on finalizing this patch? Also, I mentioned this offline but to follow up here: I think we will want an attribute to represent -fno-builtins (so that it doesn't need to be expanded out into the full list of individual no-builtin-{func} attributes, which would be both more verbose and less efficient, as well as being less backward compatible when new builtin funcs are added).

I'll break this patch in several pieces. The first one is to add the no_builtin attribute, see https://reviews.llvm.org/D61634.

Are you planning to add a follow on patch that translates the various -fno-builtin* options to these attributes? Once that is done I can refine D67923 to actually set the TLI availability from the attributes and remove the clang functionality that does this from the options.

In D61634#1732884, @tejohnson wrote:

In D61634#1682660, @gchatelet wrote:

In D61634#1679331, @tejohnson wrote:

In D61634#1635595, @tejohnson wrote:

I had some time to work on this finally late last week. I decided the most straightforward thing was to implement the necessary interface changes to the TLI analysis to make it require a Function (without any changes yet to how that analysis operates). See D66428 that I just mailed for review. That takes care of the most widespread changes needed for this migration, and afterwards we can change the analysis to look at the function attributes and make a truly per-function TLI.

D66428 went in a few weeks ago at r371284, and I just mailed the follow on patch D67923 which will adds the support into the TLI analysis to use the Function to override the available builtins (with some of the code stubbed out since we don't yet have those per-Function attributes finalized).

@gchatelet where are you at on finalizing this patch? Also, I mentioned this offline but to follow up here: I think we will want an attribute to represent -fno-builtins (so that it doesn't need to be expanded out into the full list of individual no-builtin-{func} attributes, which would be both more verbose and less efficient, as well as being less backward compatible when new builtin funcs are added).

I'll break this patch in several pieces. The first one is to add the no_builtin attribute, see https://reviews.llvm.org/D61634.

Are you planning to add a follow on patch that translates the various -fno-builtin* options to these attributes? Once that is done I can refine D67923 to actually set the TLI availability from the attributes and remove the clang functionality that does this from the options.

The no-builtin attribute is in as of rG98f3151a7dded8838fafcb5f46e6c8358def96b8. I've started working on getting the -fno-builtin flag propagate to no-builtin function attributes. I'll ping back when its ready.

tejohnson mentioned this in rG878ab6df033d: [TLI] Support for per-Function TLI that overrides available libfuncs.Dec 16 2019, 9:20 AM

gchatelet mentioned this in D71710: [instrinsics] Add @llvm.memcpy.inline instrinsics.Dec 19 2019, 8:36 AM

This has enough active reviewers as it is.

In D61634#1515176, @tejohnson wrote:

In D61634#1512020, @gchatelet wrote:

AFAIU here is a coarse plan of what needs to happen

I've listed below what I believe is the status:

Add a no-builtin clang function attribute that has the same semantic as the [no-builtin cmd line argument](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-fbuiltin)

Done (D68028 committed at bd8791610948).

Propagate it to the IR.

In the light of recent discussions and as @theraven suggested it seems more appropriate to encode them as individual IR attributes (e.g. "no-builtin-memcpy", "no-builtin-sqrt", etc..)

Done (also in D68028 committed at bd8791610948).
Additionally the -fno-builtin* options were translated to the IR attributes in D71193 (committed at 0508c994f0b1).

Propagate/merge the no-builtin IR attribute for LTO by "updating AttributeFuncs::areInlineCompatible and/or AttributeFuncs::mergeAttributesForInlining and adding a new MergeRule in include/llvm/IR/Attributes.td and writing a function like adjustCallerStackProbeSize."

This one isn't about LTO, but rather the inliner. You could have different functions in the same module even without LTO that have incompatible no-builtin attributes. There isn't any propagation required for LTO.

Not done yet - I can work on this.

Get inspiration from TargetTransformInfo to get TargetLibraryInfo on a per function basis for all passes and respect the IR attribute.

Done (D67923 was the last patch in the series to enable this, committed at 878ab6df033d).

I'm not quite sure where D71710 ([instrinsics] Add @llvm.memcpy.inline instrinsics) fits in to the above list.

Anything else missing?

Thx for the summary @tejohnson.

In D61634#1808265, @tejohnson wrote:

Propagate/merge the no-builtin IR attribute for LTO by "updating AttributeFuncs::areInlineCompatible and/or AttributeFuncs::mergeAttributesForInlining and adding a new MergeRule in include/llvm/IR/Attributes.td and writing a function like adjustCallerStackProbeSize."

This one isn't about LTO, but rather the inliner. You could have different functions in the same module even without LTO that have incompatible no-builtin attributes. There isn't any propagation required for LTO.

Not done yet - I can work on this.

That would be great!

Get inspiration from TargetTransformInfo to get TargetLibraryInfo on a per function basis for all passes and respect the IR attribute.

Done (D67923 was the last patch in the series to enable this, committed at 878ab6df033d).

I'm not quite sure where D71710 ([instrinsics] Add @llvm.memcpy.inline instrinsics) fits in to the above list.

I believe it does.

Anything else missing?

Yes when the intrinsic is in we need a way to access it from C++ so a Clang builtin is necessary. I'll take care of it.

ychen added a subscriber: ychen.Jan 9 2020, 9:11 AM

gchatelet mentioned this in rG879c825cb808: [instrinsics] Add @llvm.memcpy.inline instrinsics.Jan 28 2020, 12:43 AM

gchatelet mentioned this in D73543: [clang] Add support for __builtin_memcpy_inline.Jan 28 2020, 4:03 AM

arichardson added a subscriber: arichardson.Jan 28 2020, 2:16 PM

tejohnson mentioned this in D74162: [Inliner] Inlining should honor nobuiltin attributes.Feb 6 2020, 1:36 PM

gchatelet mentioned this in rGd65bbf81f8be: [clang] Add support for __builtin_memcpy_inline.Feb 7 2020, 3:11 PM

This has been implemented in the following patches:

tejohnson mentioned this in rGf9ca75f19bab: [Inliner] Inlining should honor nobuiltin attributes.Feb 28 2020, 7:44 AM

The last patch from my side just went in (D74162: [Inliner] Inlining should honor nobuiltin attributes). So I think this is now complete.

aaron.ballman mentioned this in D124701: [clang] Honor __attribute__((no_builtin("foo"))) on functions.May 9 2022, 9:30 AM

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

Attr.td

10 lines

AttrDocs.td

12 lines

Builtins.def

1 line

lib/

CodeGen/

CGBuilder.h

8 lines

CGBuiltin.cpp

13 lines

CGCall.cpp

9 lines

Sema/

SemaDeclAttr.cpp

27 lines

llvm/

docs/

LangRef.rst

25 lines

include/

llvm/

IR/

IRBuilder.h

3 lines

IntrinsicInst.h

23 lines

Intrinsics.td

7 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAGBuilder.cpp

9 lines

IR/

IRBuilder.cpp

51 lines

Target/

X86/

X86SelectionDAGInfo.cpp

4 lines

test/

CodeGen/

X86/

memcpy-inline.ll

14 lines

memcpy.ll

26 lines

Diff 203386

clang/include/clang/Basic/Attr.td

	Show First 20 Lines • Show All 298 Lines • ▼ Show 20 Lines
	def OpenCL : LangOpt<"OpenCL">;			def OpenCL : LangOpt<"OpenCL">;
	def RenderScript : LangOpt<"RenderScript">;			def RenderScript : LangOpt<"RenderScript">;
	def ObjC : LangOpt<"ObjC">;			def ObjC : LangOpt<"ObjC">;
	def BlocksSupported : LangOpt<"Blocks">;			def BlocksSupported : LangOpt<"Blocks">;
	def ObjCAutoRefCount : LangOpt<"ObjCAutoRefCount">;			def ObjCAutoRefCount : LangOpt<"ObjCAutoRefCount">;
	def ObjCNonFragileRuntime : LangOpt<"ObjCNonFragileRuntime",			def ObjCNonFragileRuntime : LangOpt<"ObjCNonFragileRuntime",
	"LangOpts.ObjCRuntime.allowsClassStubs()">;			"LangOpts.ObjCRuntime.allowsClassStubs()">;

	// Language option for CMSE extensions
	def Cmse : LangOpt<"Cmse">;

	// Defines targets for target-specific attributes. Empty lists are unchecked.			// Defines targets for target-specific attributes. Empty lists are unchecked.
	class TargetSpec {			class TargetSpec {
	// Specifies Architectures for which the target applies, based off the			// Specifies Architectures for which the target applies, based off the
	// ArchType enumeration in Triple.h.			// ArchType enumeration in Triple.h.
	list<string> Arches = [];			list<string> Arches = [];
	// Specifies Operating Systems for which the target applies, based off the			// Specifies Operating Systems for which the target applies, based off the
	// OSType enumeration in Triple.h			// OSType enumeration in Triple.h
	list<string> OSes;			list<string> OSes;
	▲ Show 20 Lines • Show All 2,929 Lines • ▼ Show 20 Lines
	}			}

	def ObjCExternallyRetained : InheritableAttr {			def ObjCExternallyRetained : InheritableAttr {
	let LangOpts = [ObjCAutoRefCount];			let LangOpts = [ObjCAutoRefCount];
	let Spellings = [Clang<"objc_externally_retained">];			let Spellings = [Clang<"objc_externally_retained">];
	let Subjects = SubjectList<[NonParmVar, Function, Block, ObjCMethod]>;			let Subjects = SubjectList<[NonParmVar, Function, Block, ObjCMethod]>;
	let Documentation = [ObjCExternallyRetainedDocs];			let Documentation = [ObjCExternallyRetainedDocs];
	}			}

				def NoBuiltin : InheritableAttr {
				let Spellings = [Clang<"no_builtin">];
				let Args = [VariadicStringArgument<"FunctionNames">];
				let Subjects = SubjectList<[Function]>;
				let Documentation = [NoBuiltinDocs];
				}

clang/include/clang/Basic/AttrDocs.td

	Show First 20 Lines • Show All 3,751 Lines • ▼ Show 20 Lines
	For more information see			For more information see
	`gcc documentation <https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/Microsoft-Windows-Variable-Attributes.html>`_			`gcc documentation <https://gcc.gnu.org/onlinedocs/gcc-7.2.0/gcc/Microsoft-Windows-Variable-Attributes.html>`_
	or `msvc documentation <https://docs.microsoft.com/pl-pl/cpp/cpp/selectany>`_.			or `msvc documentation <https://docs.microsoft.com/pl-pl/cpp/cpp/selectany>`_.
	}]; }			}]; }

	def WebAssemblyImportModuleDocs : Documentation {			def WebAssemblyImportModuleDocs : Documentation {
	let Category = DocCatFunction;			let Category = DocCatFunction;
	let Content = [{			let Content = [{
	Clang supports the ``__attribute__((import_module(<module_name>)))``			Clang supports the ``__attribute__((import_module(<module_name>)))``
	attribute for the WebAssembly target. This attribute may be attached to a			attribute for the WebAssembly target. This attribute may be attached to a
	function declaration, where it modifies how the symbol is to be imported			function declaration, where it modifies how the symbol is to be imported
	within the WebAssembly linking environment.			within the WebAssembly linking environment.

	WebAssembly imports use a two-level namespace scheme, consisting of a module			WebAssembly imports use a two-level namespace scheme, consisting of a module
	name, which typically identifies a module from which to import, and a field			name, which typically identifies a module from which to import, and a field
	name, which typically identifies a field from that module to import. By			name, which typically identifies a field from that module to import. By
	default, module names for C/C++ symbols are assigned automatically by the			default, module names for C/C++ symbols are assigned automatically by the
	linker. This attribute can be used to override the default behavior, and			linker. This attribute can be used to override the default behavior, and
	reuqest a specific module name be used instead.			reuqest a specific module name be used instead.
	}];			}];
	}			}

	def WebAssemblyImportNameDocs : Documentation {			def WebAssemblyImportNameDocs : Documentation {
	let Category = DocCatFunction;			let Category = DocCatFunction;
	let Content = [{			let Content = [{
	Clang supports the ``__attribute__((import_name(<name>)))``			Clang supports the ``__attribute__((import_name(<name>)))``
	attribute for the WebAssembly target. This attribute may be attached to a			attribute for the WebAssembly target. This attribute may be attached to a
	function declaration, where it modifies how the symbol is to be imported			function declaration, where it modifies how the symbol is to be imported
	within the WebAssembly linking environment.			within the WebAssembly linking environment.

	WebAssembly imports use a two-level namespace scheme, consisting of a module			WebAssembly imports use a two-level namespace scheme, consisting of a module
	name, which typically identifies a module from which to import, and a field			name, which typically identifies a module from which to import, and a field
	name, which typically identifies a field from that module to import. By			name, which typically identifies a field from that module to import. By
	default, field names for C/C++ symbols are the same as their C/C++ symbol			default, field names for C/C++ symbols are the same as their C/C++ symbol
	▲ Show 20 Lines • Show All 187 Lines • ▼ Show 20 Lines
	actual callback. Inspected, captured, or modified parameters shall not be			actual callback. Inspected, captured, or modified parameters shall not be
	listed in the ``callback`` metadata.			listed in the ``callback`` metadata.

	Example encodings for the callback performed by `pthread_create` are shown			Example encodings for the callback performed by `pthread_create` are shown
	below. The explicit attribute annotation indicates that the third parameter			below. The explicit attribute annotation indicates that the third parameter
	(`start_routine`) is called zero or more times by the `pthread_create` function,			(`start_routine`) is called zero or more times by the `pthread_create` function,
	and that the fourth parameter (`arg`) is passed along. Note that the callback			and that the fourth parameter (`arg`) is passed along. Note that the callback
	behavior of `pthread_create` is automatically recognized by Clang. In addition,			behavior of `pthread_create` is automatically recognized by Clang. In addition,
	the declarations of `__kmpc_fork_teams` and `__kmpc_fork_call`, generated for			the declarations of `__kmpc_fork_teams` and `__kmpc_fork_call`, generated for
	`#pragma omp target teams` and `#pragma omp parallel`, respectively, are also			`#pragma omp target teams` and `#pragma omp parallel`, respectively, are also
	automatically recognized as broker functions. Further functions might be added			automatically recognized as broker functions. Further functions might be added
	in the future.			in the future.

	.. code-block:: c			.. code-block:: c

	__attribute__((callback (start_routine, arg)))			__attribute__((callback (start_routine, arg)))
	int pthread_create(pthread_t thread, const pthread_attr_t attr,			int pthread_create(pthread_t thread, const pthread_attr_t attr,
	▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	Studio debugger uses this information to `profile memory usage`_.			Studio debugger uses this information to `profile memory usage`_.

	.. _profile memory usage: https://docs.microsoft.com/en-us/visualstudio/profiling/memory-usage			.. _profile memory usage: https://docs.microsoft.com/en-us/visualstudio/profiling/memory-usage

	This attribute does not affect optimizations in any way, unlike GCC's			This attribute does not affect optimizations in any way, unlike GCC's
	``__attribute__((malloc))``.			``__attribute__((malloc))``.
	}];			}];
	}			}

				def NoBuiltinDocs : Documentation {
				let Category = DocCatFunction;
				let Content = [{
				}];
				}

clang/include/clang/Basic/Builtins.def

	Show First 20 Lines • Show All 498 Lines • ▼ Show 20 Lines
	BUILTIN(__builtin_longjmp, "vv**i", "r")			BUILTIN(__builtin_longjmp, "vv**i", "r")
	BUILTIN(__builtin_unwind_init, "v", "")			BUILTIN(__builtin_unwind_init, "v", "")
	BUILTIN(__builtin_eh_return_data_regno, "iIi", "nc")			BUILTIN(__builtin_eh_return_data_regno, "iIi", "nc")
	BUILTIN(__builtin_snprintf, "iczcC.", "nFp:2:")			BUILTIN(__builtin_snprintf, "iczcC.", "nFp:2:")
	BUILTIN(__builtin_vsprintf, "iccCa", "nFP:1:")			BUILTIN(__builtin_vsprintf, "iccCa", "nFP:1:")
	BUILTIN(__builtin_vsnprintf, "iczcCa", "nFP:2:")			BUILTIN(__builtin_vsnprintf, "iczcCa", "nFP:2:")
	BUILTIN(__builtin_thread_pointer, "v*", "nc")			BUILTIN(__builtin_thread_pointer, "v*", "nc")
	BUILTIN(__builtin_launder, "vv", "nt")			BUILTIN(__builtin_launder, "vv", "nt")
				BUILTIN(__builtin_memcpy_inline, "vvvCz", "n")
	LANGBUILTIN(__builtin_is_constant_evaluated, "b", "n", CXX_LANG)			LANGBUILTIN(__builtin_is_constant_evaluated, "b", "n", CXX_LANG)

	// GCC exception builtins			// GCC exception builtins
	BUILTIN(__builtin_eh_return, "vzv*", "r") // FIXME: Takes intptr_t, not size_t!			BUILTIN(__builtin_eh_return, "vzv*", "r") // FIXME: Takes intptr_t, not size_t!
	BUILTIN(__builtin_frob_return_addr, "vv", "n")			BUILTIN(__builtin_frob_return_addr, "vv", "n")
	BUILTIN(__builtin_dwarf_cfa, "v*", "n")			BUILTIN(__builtin_dwarf_cfa, "v*", "n")
	BUILTIN(__builtin_init_dwarf_reg_size_table, "vv*", "n")			BUILTIN(__builtin_init_dwarf_reg_size_table, "vv*", "n")
	BUILTIN(__builtin_dwarf_sp_column, "Ui", "n")			BUILTIN(__builtin_dwarf_sp_column, "Ui", "n")
	▲ Show 20 Lines • Show All 1,029 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGBuilder.h

Show First 20 Lines • Show All 278 Lines • ▼ Show 20 Lines	public:
}		}
llvm::CallInst *CreateMemCpy(Address Dest, Address Src, uint64_t Size,		llvm::CallInst *CreateMemCpy(Address Dest, Address Src, uint64_t Size,
bool IsVolatile = false) {		bool IsVolatile = false) {
return CreateMemCpy(Dest.getPointer(), Dest.getAlignment().getQuantity(),		return CreateMemCpy(Dest.getPointer(), Dest.getAlignment().getQuantity(),
Src.getPointer(), Src.getAlignment().getQuantity(),		Src.getPointer(), Src.getAlignment().getQuantity(),
Size, IsVolatile);		Size, IsVolatile);
}		}

		using CGBuilderBaseTy::CreateMemCpyInline;
		llvm::CallInst *CreateMemCpyInline(Address Dest, Address Src,
		uint64_t Size) {
		return CreateMemCpyInline(
		Dest.getPointer(), Dest.getAlignment().getQuantity(), Src.getPointer(),
		Src.getAlignment().getQuantity(), getInt64(Size));
		}

using CGBuilderBaseTy::CreateMemMove;		using CGBuilderBaseTy::CreateMemMove;
llvm::CallInst CreateMemMove(Address Dest, Address Src, llvm::Value Size,		llvm::CallInst CreateMemMove(Address Dest, Address Src, llvm::Value Size,
bool IsVolatile = false) {		bool IsVolatile = false) {
return CreateMemMove(Dest.getPointer(), Dest.getAlignment().getQuantity(),		return CreateMemMove(Dest.getPointer(), Dest.getAlignment().getQuantity(),
Src.getPointer(), Src.getAlignment().getQuantity(),		Src.getPointer(), Src.getAlignment().getQuantity(),
Size, IsVolatile);		Size, IsVolatile);
}		}

Show All 12 Lines

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,356 Lines • ▼ Show 20 Lines	case Builtin::BI__builtin_memcpy: {
Value *SizeVal = EmitScalarExpr(E->getArg(2));		Value *SizeVal = EmitScalarExpr(E->getArg(2));
EmitNonNullArgCheck(RValue::get(Dest.getPointer()), E->getArg(0)->getType(),		EmitNonNullArgCheck(RValue::get(Dest.getPointer()), E->getArg(0)->getType(),
E->getArg(0)->getExprLoc(), FD, 0);		E->getArg(0)->getExprLoc(), FD, 0);
EmitNonNullArgCheck(RValue::get(Src.getPointer()), E->getArg(1)->getType(),		EmitNonNullArgCheck(RValue::get(Src.getPointer()), E->getArg(1)->getType(),
E->getArg(1)->getExprLoc(), FD, 1);		E->getArg(1)->getExprLoc(), FD, 1);
Builder.CreateMemCpy(Dest, Src, SizeVal, false);		Builder.CreateMemCpy(Dest, Src, SizeVal, false);
return RValue::get(Dest.getPointer());		return RValue::get(Dest.getPointer());
}		}
		case Builtin::BI__builtin_memcpy_inline: {
		Address Dest = EmitPointerWithAlignment(E->getArg(0));
		Address Src = EmitPointerWithAlignment(E->getArg(1));
		uint64_t Size =
		E->getArg(2)->EvaluateKnownConstInt(getContext()).getZExtValue();
		EmitNonNullArgCheck(RValue::get(Dest.getPointer()), E->getArg(0)->getType(),
		E->getArg(0)->getExprLoc(), FD, 0);
		EmitNonNullArgCheck(RValue::get(Src.getPointer()), E->getArg(1)->getType(),
		E->getArg(1)->getExprLoc(), FD, 1);
		Builder.CreateMemCpyInline(Dest, Src, Size);
		return RValue::get(nullptr);
		}
case Builtin::BI__builtin_char_memchr:		case Builtin::BI__builtin_char_memchr:
BuiltinID = Builtin::BI__builtin_memchr;		BuiltinID = Builtin::BI__builtin_memchr;
break;		break;

case Builtin::BI__builtin___memcpy_chk: {		case Builtin::BI__builtin___memcpy_chk: {
// fold __builtin_memcpy_chk(x, y, cst1, cst2) to memcpy iff cst1<=cst2.		// fold __builtin_memcpy_chk(x, y, cst1, cst2) to memcpy iff cst1<=cst2.
Expr::EvalResult SizeResult, DstSizeResult;		Expr::EvalResult SizeResult, DstSizeResult;
if (!E->getArg(2)->EvaluateAsInt(SizeResult, CGM.getContext()) \|\|		if (!E->getArg(2)->EvaluateAsInt(SizeResult, CGM.getContext()) \|\|
▲ Show 20 Lines • Show All 11,916 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGCall.cpp

Show First 20 Lines • Show All 1,840 Lines • ▼ Show 20 Lines	if (TargetDecl) {
if (TargetDecl->hasAttr<NoReturnAttr>())		if (TargetDecl->hasAttr<NoReturnAttr>())
FuncAttrs.addAttribute(llvm::Attribute::NoReturn);		FuncAttrs.addAttribute(llvm::Attribute::NoReturn);
if (TargetDecl->hasAttr<ColdAttr>())		if (TargetDecl->hasAttr<ColdAttr>())
FuncAttrs.addAttribute(llvm::Attribute::Cold);		FuncAttrs.addAttribute(llvm::Attribute::Cold);
if (TargetDecl->hasAttr<NoDuplicateAttr>())		if (TargetDecl->hasAttr<NoDuplicateAttr>())
FuncAttrs.addAttribute(llvm::Attribute::NoDuplicate);		FuncAttrs.addAttribute(llvm::Attribute::NoDuplicate);
if (TargetDecl->hasAttr<ConvergentAttr>())		if (TargetDecl->hasAttr<ConvergentAttr>())
FuncAttrs.addAttribute(llvm::Attribute::Convergent);		FuncAttrs.addAttribute(llvm::Attribute::Convergent);
		if (const auto *Attr = TargetDecl->getAttr<NoBuiltinAttr>()) {
		// TODO: check that function names are valid for the TargetLibraryInfo.
		for(const auto& FunctionName : Attr->functionNames()){
		SmallString<32> AttributeName;
		AttributeName += "no-builtin-";
		AttributeName += FunctionName;
		FuncAttrs.addAttribute(AttributeName);
		}
		}

if (const FunctionDecl *Fn = dyn_cast<FunctionDecl>(TargetDecl)) {		if (const FunctionDecl *Fn = dyn_cast<FunctionDecl>(TargetDecl)) {
AddAttributesFromFunctionProtoType(		AddAttributesFromFunctionProtoType(
getContext(), FuncAttrs, Fn->getType()->getAs<FunctionProtoType>());		getContext(), FuncAttrs, Fn->getType()->getAs<FunctionProtoType>());
// Don't use [[noreturn]] or _Noreturn for a call to a virtual function.		// Don't use [[noreturn]] or _Noreturn for a call to a virtual function.
// These attributes are not inherited by overloads.		// These attributes are not inherited by overloads.
const CXXMethodDecl *MD = dyn_cast<CXXMethodDecl>(Fn);		const CXXMethodDecl *MD = dyn_cast<CXXMethodDecl>(Fn);
if (Fn->isNoReturn() && !(AttrOnCallSite && MD && MD->isVirtual()))		if (Fn->isNoReturn() && !(AttrOnCallSite && MD && MD->isVirtual()))
▲ Show 20 Lines • Show All 2,719 Lines • Show Last 20 Lines

clang/lib/Sema/SemaDeclAttr.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,098 Lines • ▼ Show 20 Lines	static void handleDiagnoseIfAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
bool ArgDependent = false;		bool ArgDependent = false;
if (const auto *FD = dyn_cast<FunctionDecl>(D))		if (const auto *FD = dyn_cast<FunctionDecl>(D))
ArgDependent = ArgumentDependenceChecker(FD).referencesArgs(Cond);		ArgDependent = ArgumentDependenceChecker(FD).referencesArgs(Cond);
D->addAttr(::new (S.Context) DiagnoseIfAttr(		D->addAttr(::new (S.Context) DiagnoseIfAttr(
AL.getRange(), S.Context, Cond, Msg, DiagType, ArgDependent,		AL.getRange(), S.Context, Cond, Msg, DiagType, ArgDependent,
cast<NamedDecl>(D), AL.getAttributeSpellingListIndex()));		cast<NamedDecl>(D), AL.getAttributeSpellingListIndex()));
}		}

		static void handleNoBuiltin(Sema &S, Decl *D, const ParsedAttr &AL) {
		if (D->hasAttr<NoBuiltinAttr>()) {
		S.Diag(D->getBeginLoc(), diag::err_attribute_only_once_per_parameter) << AL;
		return;
		}

		if (!checkAttributeAtLeastNumArgs(S, AL, 1))
		return;

		std::vector<StringRef> FunctionNames;
		for (unsigned I = 0, E = AL.getNumArgs(); I != E; ++I) {
		StringRef FunctionName;
		SourceLocation LiteralLoc;
		if (!S.checkStringLiteralArgumentAttr(AL, I, FunctionName, &LiteralLoc))
		return;
		// Check valid function name.
		FunctionNames.push_back(FunctionName);
		}

		D->addAttr(::new (S.Context) NoBuiltinAttr(
		AL.getRange(), S.Context, FunctionNames.data(), FunctionNames.size(),
		AL.getAttributeSpellingListIndex()));
		}

static void handlePassObjectSizeAttr(Sema &S, Decl *D, const ParsedAttr &AL) {		static void handlePassObjectSizeAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
if (D->hasAttr<PassObjectSizeAttr>()) {		if (D->hasAttr<PassObjectSizeAttr>()) {
S.Diag(D->getBeginLoc(), diag::err_attribute_only_once_per_parameter) << AL;		S.Diag(D->getBeginLoc(), diag::err_attribute_only_once_per_parameter) << AL;
return;		return;
}		}

Expr *E = AL.getArgAsExpr(0);		Expr *E = AL.getArgAsExpr(0);
uint32_t Type;		uint32_t Type;
▲ Show 20 Lines • Show All 5,626 Lines • ▼ Show 20 Lines	case ParsedAttr::AT_Destructor:
handleDestructorAttr(S, D, AL);		handleDestructorAttr(S, D, AL);
break;		break;
case ParsedAttr::AT_EnableIf:		case ParsedAttr::AT_EnableIf:
handleEnableIfAttr(S, D, AL);		handleEnableIfAttr(S, D, AL);
break;		break;
case ParsedAttr::AT_DiagnoseIf:		case ParsedAttr::AT_DiagnoseIf:
handleDiagnoseIfAttr(S, D, AL);		handleDiagnoseIfAttr(S, D, AL);
break;		break;
		case ParsedAttr::AT_NoBuiltin:
		handleNoBuiltin(S, D, AL);
		break;
case ParsedAttr::AT_ExtVectorType:		case ParsedAttr::AT_ExtVectorType:
handleExtVectorTypeAttr(S, D, AL);		handleExtVectorTypeAttr(S, D, AL);
break;		break;
case ParsedAttr::AT_ExternalSourceSymbol:		case ParsedAttr::AT_ExternalSourceSymbol:
handleExternalSourceSymbolAttr(S, D, AL);		handleExternalSourceSymbolAttr(S, D, AL);
break;		break;
case ParsedAttr::AT_MinSize:		case ParsedAttr::AT_MinSize:
handleMinSizeAttr(S, D, AL);		handleMinSizeAttr(S, D, AL);
▲ Show 20 Lines • Show All 1,879 Lines • Show Last 20 Lines

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 11,341 Lines • ▼ Show 20 Lines
	The '``llvm.thread.pointer``' intrinsic returns a pointer to the TLS area			The '``llvm.thread.pointer``' intrinsic returns a pointer to the TLS area
	for the current thread. The exact semantics of this value are target			for the current thread. The exact semantics of this value are target
	specific: it may point to the start of TLS area, to the end, or somewhere			specific: it may point to the start of TLS area, to the end, or somewhere
	in the middle. Depending on the target, this intrinsic may read a register,			in the middle. Depending on the target, this intrinsic may read a register,
	call a helper function, read from an alternate memory space, or perform			call a helper function, read from an alternate memory space, or perform
	other operations necessary to locate the TLS area. Not all targets support			other operations necessary to locate the TLS area. Not all targets support
	this intrinsic.			this intrinsic.

				'``llvm.memcpy.inline``' Intrinsic
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				Syntax:
				"""""""

				::

				declare i8* @llvm.memcpy.inline.p0i8.p0i8.i32(i8* <dest>, i8* <src>,
				i32 <len>)

				Overview:
				"""""""""

				The '``llvm.memcpy.inline``' intrinsic ...

				Semantics:
				""""""""""

				The '``llvm.memcpy.inline``' intrinsic ...

	Standard C Library Intrinsics			Standard C Library Intrinsics
	-----------------------------			-----------------------------

	LLVM provides intrinsics for a few important standard C library			LLVM provides intrinsics for a few important standard C library
	functions. These intrinsics allow source-language front-ends to pass			functions. These intrinsics allow source-language front-ends to pass
	information about the alignment of the pointer arguments to the code			information about the alignment of the pointer arguments to the code
	generator, providing opportunity for more efficient code generation.			generator, providing opportunity for more efficient code generation.

	▲ Show 20 Lines • Show All 3,791 Lines • ▼ Show 20 Lines
	Arguments:			Arguments:
	""""""""""			""""""""""

	The first argument to the '``llvm.experimental.constrained.fptrunc``'			The first argument to the '``llvm.experimental.constrained.fptrunc``'
	intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector			intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
	<t_vector>` of floating point values. This argument must be larger in size			<t_vector>` of floating point values. This argument must be larger in size
	than the result.			than the result.

	The second and third arguments specify the rounding mode and exception			The second and third arguments specify the rounding mode and exception
	behavior as described above.			behavior as described above.

	Semantics:			Semantics:
	""""""""""			""""""""""

	The result produced is a floating point value truncated to be smaller in size			The result produced is a floating point value truncated to be smaller in size
	than the operand.			than the operand.

	'``llvm.experimental.constrained.fpext``' Intrinsic			'``llvm.experimental.constrained.fpext``' Intrinsic
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	Syntax:			Syntax:
	"""""""			"""""""

	::			::

	declare <ty2>			declare <ty2>
	@llvm.experimental.constrained.fpext(<type> <value>,			@llvm.experimental.constrained.fpext(<type> <value>,
	metadata <exception behavior>)			metadata <exception behavior>)

	Overview:			Overview:
	"""""""""			"""""""""

	The '``llvm.experimental.constrained.fpext``' intrinsic extends a			The '``llvm.experimental.constrained.fpext``' intrinsic extends a
	floating-point ``value`` to a larger floating-point value.			floating-point ``value`` to a larger floating-point value.

	Arguments:			Arguments:
	""""""""""			""""""""""

	The first argument to the '``llvm.experimental.constrained.fpext``'			The first argument to the '``llvm.experimental.constrained.fpext``'
	intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector			intrinsic must be :ref:`floating point <t_floating>` or :ref:`vector
	<t_vector>` of floating point values. This argument must be smaller in size			<t_vector>` of floating point values. This argument must be smaller in size
	▲ Show 20 Lines • Show All 2,055 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IRBuilder.h

Show First 20 Lines • Show All 454 Lines • ▼ Show 20 Lines	public:

CallInst CreateMemCpy(Value Dst, unsigned DstAlign, Value *Src,		CallInst CreateMemCpy(Value Dst, unsigned DstAlign, Value *Src,
unsigned SrcAlign, Value *Size,		unsigned SrcAlign, Value *Size,
bool isVolatile = false, MDNode *TBAATag = nullptr,		bool isVolatile = false, MDNode *TBAATag = nullptr,
MDNode *TBAAStructTag = nullptr,		MDNode *TBAAStructTag = nullptr,
MDNode *ScopeTag = nullptr,		MDNode *ScopeTag = nullptr,
MDNode *NoAliasTag = nullptr);		MDNode *NoAliasTag = nullptr);

		CallInst CreateMemCpyInline(Value Dst, unsigned DstAlign, Value *Src,
		unsigned SrcAlign, Value *Size);

/// Create and insert an element unordered-atomic memcpy between the		/// Create and insert an element unordered-atomic memcpy between the
/// specified pointers.		/// specified pointers.
///		///
/// DstAlign/SrcAlign are the alignments of the Dst/Src pointers, respectively.		/// DstAlign/SrcAlign are the alignments of the Dst/Src pointers, respectively.
///		///
/// If the pointers aren't i8*, they will be converted. If a TBAA tag is		/// If the pointers aren't i8*, they will be converted. If a TBAA tag is
/// specified, it will be added to the instruction. Likewise with alias.scope		/// specified, it will be added to the instruction. Likewise with alias.scope
/// and noalias tags.		/// and noalias tags.
▲ Show 20 Lines • Show All 1,926 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicInst.h

Show First 20 Lines • Show All 572 Lines • ▼ Show 20 Lines	public:
}		}

void setVolatile(Constant *V) { setArgOperand(ARG_VOLATILE, V); }		void setVolatile(Constant *V) { setArgOperand(ARG_VOLATILE, V); }

// Methods for support type inquiry through isa, cast, and dyn_cast:		// Methods for support type inquiry through isa, cast, and dyn_cast:
static bool classof(const IntrinsicInst *I) {		static bool classof(const IntrinsicInst *I) {
switch (I->getIntrinsicID()) {		switch (I->getIntrinsicID()) {
case Intrinsic::memcpy:		case Intrinsic::memcpy:
		case Intrinsic::memcpy_inline:
case Intrinsic::memmove:		case Intrinsic::memmove:
case Intrinsic::memset:		case Intrinsic::memset:
return true;		return true;
default: return false;		default: return false;
}		}
}		}
static bool classof(const Value *V) {		static bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
Show All 12 Lines	public:
}		}
};		};

/// This class wraps the llvm.memcpy/memmove intrinsics.		/// This class wraps the llvm.memcpy/memmove intrinsics.
class MemTransferInst : public MemTransferBase<MemIntrinsic> {		class MemTransferInst : public MemTransferBase<MemIntrinsic> {
public:		public:
// Methods for support type inquiry through isa, cast, and dyn_cast:		// Methods for support type inquiry through isa, cast, and dyn_cast:
static bool classof(const IntrinsicInst *I) {		static bool classof(const IntrinsicInst *I) {
return I->getIntrinsicID() == Intrinsic::memcpy \|\|		switch (I->getIntrinsicID()) {
I->getIntrinsicID() == Intrinsic::memmove;		case Intrinsic::memcpy:
		case Intrinsic::memcpy_inline:
		case Intrinsic::memmove:
		return true;
		default:
		return false;
		}
}		}
static bool classof(const Value *V) {		static bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}
};		};

/// This class wraps the llvm.memcpy intrinsic.		/// This class wraps the llvm.memcpy intrinsic.
class MemCpyInst : public MemTransferInst {		class MemCpyInst : public MemTransferInst {
public:		public:
// Methods for support type inquiry through isa, cast, and dyn_cast:		// Methods for support type inquiry through isa, cast, and dyn_cast:
static bool classof(const IntrinsicInst *I) {		static bool classof(const IntrinsicInst *I) {
return I->getIntrinsicID() == Intrinsic::memcpy;		return I->getIntrinsicID() == Intrinsic::memcpy;
}		}
static bool classof(const Value *V) {		static bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}
};		};

		/// This class wraps the llvm.memcpy.inline intrinsic.
		class MemCpyInlineInst : public MemTransferInst {
		public:
		// Methods for support type inquiry through isa, cast, and dyn_cast:
		static bool classof(const IntrinsicInst *I) {
		return I->getIntrinsicID() == Intrinsic::memcpy_inline;
		}
		static bool classof(const Value *V) {
		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
		}
		};

/// This class wraps the llvm.memmove intrinsic.		/// This class wraps the llvm.memmove intrinsic.
class MemMoveInst : public MemTransferInst {		class MemMoveInst : public MemTransferInst {
public:		public:
// Methods for support type inquiry through isa, cast, and dyn_cast:		// Methods for support type inquiry through isa, cast, and dyn_cast:
static bool classof(const IntrinsicInst *I) {		static bool classof(const IntrinsicInst *I) {
return I->getIntrinsicID() == Intrinsic::memmove;		return I->getIntrinsicID() == Intrinsic::memmove;
}		}
static bool classof(const Value *V) {		static bool classof(const Value *V) {
▲ Show 20 Lines • Show All 229 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 477 Lines • ▼ Show 20 Lines
	// A call to profile runtime for value profiling of target expressions			// A call to profile runtime for value profiling of target expressions
	// through instrumentation based profiling.			// through instrumentation based profiling.
	def int_instrprof_value_profile : Intrinsic<[],			def int_instrprof_value_profile : Intrinsic<[],
	[llvm_ptr_ty, llvm_i64_ty,			[llvm_ptr_ty, llvm_i64_ty,
	llvm_i64_ty, llvm_i32_ty,			llvm_i64_ty, llvm_i32_ty,
	llvm_i32_ty],			llvm_i32_ty],
	[]>;			[]>;

				// Memcpy semantic that is guaranteed to be inlined.
				def int_memcpy_inline
				: Intrinsic<[],
				[ llvm_anyptr_ty, llvm_anyptr_ty, llvm_anyint_ty, llvm_i1_ty ],
				[ IntrArgMemOnly, NoCapture<0>, NoCapture<1>, WriteOnly<0>, ReadOnly<1>,
				ImmArg<2>, ImmArg<3> ]>;

	//===------------------- Standard C Library Intrinsics --------------------===//			//===------------------- Standard C Library Intrinsics --------------------===//
	//			//

	def int_memcpy : Intrinsic<[],			def int_memcpy : Intrinsic<[],
	[llvm_anyptr_ty, llvm_anyptr_ty, llvm_anyint_ty,			[llvm_anyptr_ty, llvm_anyptr_ty, llvm_anyint_ty,
	llvm_i1_ty],			llvm_i1_ty],
	[IntrArgMemOnly, NoCapture<0>, NoCapture<1>,			[IntrArgMemOnly, NoCapture<0>, NoCapture<1>,
	WriteOnly<0>, ReadOnly<1>, ImmArg<3>]>;			WriteOnly<0>, ReadOnly<1>, ImmArg<3>]>;
	▲ Show 20 Lines • Show All 712 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,609 Lines • ▼ Show 20 Lines	case Intrinsic::write_register: {
return;		return;
}		}
case Intrinsic::setjmp:		case Intrinsic::setjmp:
lowerCallToExternalSymbol(I, &"_setjmp"[!TLI.usesUnderscoreSetJmp()]);		lowerCallToExternalSymbol(I, &"_setjmp"[!TLI.usesUnderscoreSetJmp()]);
return;		return;
case Intrinsic::longjmp:		case Intrinsic::longjmp:
lowerCallToExternalSymbol(I, &"_longjmp"[!TLI.usesUnderscoreLongJmp()]);		lowerCallToExternalSymbol(I, &"_longjmp"[!TLI.usesUnderscoreLongJmp()]);
return;		return;
		case Intrinsic::memcpy_inline:
case Intrinsic::memcpy: {		case Intrinsic::memcpy: {
const auto &MCI = cast<MemCpyInst>(I);		const auto &MCI = cast<MemTransferInst>(I);
		assert((isa<MemCpyInlineInst>(I) \|\| isa<MemCpyInst>(I)) &&
		"must be a memcpy");
SDValue Op1 = getValue(I.getArgOperand(0));		SDValue Op1 = getValue(I.getArgOperand(0));
SDValue Op2 = getValue(I.getArgOperand(1));		SDValue Op2 = getValue(I.getArgOperand(1));
SDValue Op3 = getValue(I.getArgOperand(2));		SDValue Op3 = getValue(I.getArgOperand(2));
// @llvm.memcpy defines 0 and 1 to both mean no alignment.		// @llvm.memcpy defines 0 and 1 to both mean no alignment.
unsigned DstAlign = std::max<unsigned>(MCI.getDestAlignment(), 1);		unsigned DstAlign = std::max<unsigned>(MCI.getDestAlignment(), 1);
unsigned SrcAlign = std::max<unsigned>(MCI.getSourceAlignment(), 1);		unsigned SrcAlign = std::max<unsigned>(MCI.getSourceAlignment(), 1);
unsigned Align = MinAlign(DstAlign, SrcAlign);		unsigned Align = MinAlign(DstAlign, SrcAlign);
bool isVol = MCI.isVolatile();		bool isVol = MCI.isVolatile();
bool isTC = I.isTailCall() && isInTailCallPosition(&I, DAG.getTarget());		bool isTC = I.isTailCall() && isInTailCallPosition(&I, DAG.getTarget());
// FIXME: Support passing different dest/src alignments to the memcpy DAG		// FIXME: Support passing different dest/src alignments to the memcpy DAG
// node.		// node.
		bool isAlwaysInline =
		isa<MemCpyInlineInst>(I) \|\| I.hasFnAttr("no-builtin-memcpy");
SDValue MC = DAG.getMemcpy(getRoot(), sdl, Op1, Op2, Op3, Align, isVol,		SDValue MC = DAG.getMemcpy(getRoot(), sdl, Op1, Op2, Op3, Align, isVol,
false, isTC,		isAlwaysInline, isTC,
MachinePointerInfo(I.getArgOperand(0)),		MachinePointerInfo(I.getArgOperand(0)),
MachinePointerInfo(I.getArgOperand(1)));		MachinePointerInfo(I.getArgOperand(1)));
updateDAGForMaybeTailCall(MC);		updateDAGForMaybeTailCall(MC);
return;		return;
}		}
case Intrinsic::memset: {		case Intrinsic::memset: {
const auto &MSI = cast<MemSetInst>(I);		const auto &MSI = cast<MemSetInst>(I);
SDValue Op1 = getValue(I.getArgOperand(0));		SDValue Op1 = getValue(I.getArgOperand(0));
▲ Show 20 Lines • Show All 5,285 Lines • Show Last 20 Lines

llvm/lib/IR/IRBuilder.cpp

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	static InvokeInst createInvokeHelper(Function Invokee, BasicBlock *NormalDest,
InvokeInst *II =		InvokeInst *II =
InvokeInst::Create(Invokee, NormalDest, UnwindDest, Ops, Name);		InvokeInst::Create(Invokee, NormalDest, UnwindDest, Ops, Name);
Builder->GetInsertBlock()->getInstList().insert(Builder->GetInsertPoint(),		Builder->GetInsertBlock()->getInstList().insert(Builder->GetInsertPoint(),
II);		II);
Builder->SetInstDebugLocation(II);		Builder->SetInstDebugLocation(II);
return II;		return II;
}		}

		static void ForwardAttribute(const Function *F, StringRef Attribute,
		CallInst *CI) {
		if (F->hasFnAttribute(Attribute)) {
		CI->addAttribute(AttributeList::FunctionIndex,
		F->getFnAttribute(Attribute));
		}
		}

CallInst *IRBuilderBase::		CallInst *IRBuilderBase::
CreateMemSet(Value Ptr, Value Val, Value *Size, unsigned Align,		CreateMemSet(Value Ptr, Value Val, Value *Size, unsigned Align,
bool isVolatile, MDNode TBAATag, MDNode ScopeTag,		bool isVolatile, MDNode TBAATag, MDNode ScopeTag,
MDNode *NoAliasTag) {		MDNode *NoAliasTag) {
Ptr = getCastedInt8PtrValue(Ptr);		Ptr = getCastedInt8PtrValue(Ptr);
Value *Ops[] = {Ptr, Val, Size, getInt1(isVolatile)};		Value *Ops[] = {Ptr, Val, Size, getInt1(isVolatile)};
Type *Tys[] = { Ptr->getType(), Size->getType() };		Type *Tys[] = { Ptr->getType(), Size->getType() };
Module *M = BB->getParent()->getParent();		Function *F = BB->getParent();
		Module *M = F->getParent();
Function *TheFn = Intrinsic::getDeclaration(M, Intrinsic::memset, Tys);		Function *TheFn = Intrinsic::getDeclaration(M, Intrinsic::memset, Tys);

CallInst *CI = createCallHelper(TheFn, Ops, this);		CallInst *CI = createCallHelper(TheFn, Ops, this);

if (Align > 0)		if (Align > 0)
cast<MemSetInst>(CI)->setDestAlignment(Align);		cast<MemSetInst>(CI)->setDestAlignment(Align);

// Set the TBAA info if present.		// Set the TBAA info if present.
if (TBAATag)		if (TBAATag)
CI->setMetadata(LLVMContext::MD_tbaa, TBAATag);		CI->setMetadata(LLVMContext::MD_tbaa, TBAATag);

if (ScopeTag)		if (ScopeTag)
CI->setMetadata(LLVMContext::MD_alias_scope, ScopeTag);		CI->setMetadata(LLVMContext::MD_alias_scope, ScopeTag);

if (NoAliasTag)		if (NoAliasTag)
CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);		CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);

		ForwardAttribute(F, "no-builtin-memset", CI);

return CI;		return CI;
}		}

CallInst *IRBuilderBase::CreateElementUnorderedAtomicMemSet(		CallInst *IRBuilderBase::CreateElementUnorderedAtomicMemSet(
Value Ptr, Value Val, Value *Size, unsigned Align, uint32_t ElementSize,		Value Ptr, Value Val, Value *Size, unsigned Align, uint32_t ElementSize,
MDNode TBAATag, MDNode ScopeTag, MDNode *NoAliasTag) {		MDNode TBAATag, MDNode ScopeTag, MDNode *NoAliasTag) {
assert(Align >= ElementSize &&		assert(Align >= ElementSize &&
"Pointer alignment must be at least element size.");		"Pointer alignment must be at least element size.");
Show All 28 Lines	CreateMemCpy(Value Dst, unsigned DstAlign, Value Src, unsigned SrcAlign,
MDNode TBAAStructTag, MDNode ScopeTag, MDNode *NoAliasTag) {		MDNode TBAAStructTag, MDNode ScopeTag, MDNode *NoAliasTag) {
assert((DstAlign == 0 \|\| isPowerOf2_32(DstAlign)) && "Must be 0 or a power of 2");		assert((DstAlign == 0 \|\| isPowerOf2_32(DstAlign)) && "Must be 0 or a power of 2");
assert((SrcAlign == 0 \|\| isPowerOf2_32(SrcAlign)) && "Must be 0 or a power of 2");		assert((SrcAlign == 0 \|\| isPowerOf2_32(SrcAlign)) && "Must be 0 or a power of 2");
Dst = getCastedInt8PtrValue(Dst);		Dst = getCastedInt8PtrValue(Dst);
Src = getCastedInt8PtrValue(Src);		Src = getCastedInt8PtrValue(Src);

Value *Ops[] = {Dst, Src, Size, getInt1(isVolatile)};		Value *Ops[] = {Dst, Src, Size, getInt1(isVolatile)};
Type *Tys[] = { Dst->getType(), Src->getType(), Size->getType() };		Type *Tys[] = { Dst->getType(), Src->getType(), Size->getType() };
Module *M = BB->getParent()->getParent();		Function *F = BB->getParent();
		Module *M = F->getParent();
Function *TheFn = Intrinsic::getDeclaration(M, Intrinsic::memcpy, Tys);		Function *TheFn = Intrinsic::getDeclaration(M, Intrinsic::memcpy, Tys);

CallInst *CI = createCallHelper(TheFn, Ops, this);		CallInst *CI = createCallHelper(TheFn, Ops, this);

auto* MCI = cast<MemCpyInst>(CI);		auto* MCI = cast<MemCpyInst>(CI);
if (DstAlign > 0)		if (DstAlign > 0)
MCI->setDestAlignment(DstAlign);		MCI->setDestAlignment(DstAlign);
if (SrcAlign > 0)		if (SrcAlign > 0)
MCI->setSourceAlignment(SrcAlign);		MCI->setSourceAlignment(SrcAlign);

// Set the TBAA info if present.		// Set the TBAA info if present.
if (TBAATag)		if (TBAATag)
CI->setMetadata(LLVMContext::MD_tbaa, TBAATag);		CI->setMetadata(LLVMContext::MD_tbaa, TBAATag);

// Set the TBAA Struct info if present.		// Set the TBAA Struct info if present.
if (TBAAStructTag)		if (TBAAStructTag)
CI->setMetadata(LLVMContext::MD_tbaa_struct, TBAAStructTag);		CI->setMetadata(LLVMContext::MD_tbaa_struct, TBAAStructTag);

if (ScopeTag)		if (ScopeTag)
CI->setMetadata(LLVMContext::MD_alias_scope, ScopeTag);		CI->setMetadata(LLVMContext::MD_alias_scope, ScopeTag);

if (NoAliasTag)		if (NoAliasTag)
CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);		CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);

		ForwardAttribute(F, "no-builtin-memcpy", CI);

		return CI;
		}

		CallInst IRBuilderBase::CreateMemCpyInline(Value Dst, unsigned DstAlign,
		Value *Src, unsigned SrcAlign,
		Value *Size) {
		assert((DstAlign == 0 \|\| isPowerOf2_32(DstAlign)) &&
		"Must be 0 or a power of 2");
		assert((SrcAlign == 0 \|\| isPowerOf2_32(SrcAlign)) &&
		"Must be 0 or a power of 2");
		Dst = getCastedInt8PtrValue(Dst);
		Src = getCastedInt8PtrValue(Src);
		Value *IsVolatile = getInt1(false);

		Value *Ops[] = {Dst, Src, Size, IsVolatile};
		Type *Tys[] = {Dst->getType(), Src->getType(), Size->getType()};
		Function *F = BB->getParent();
		Module *M = F->getParent();
		Function *TheFn = Intrinsic::getDeclaration(M, Intrinsic::memcpy_inline, Tys);

		CallInst *CI = createCallHelper(TheFn, Ops, this);

		auto *MCI = cast<MemCpyInlineInst>(CI);
		if (DstAlign > 0)
		MCI->setDestAlignment(DstAlign);
		if (SrcAlign > 0)
		MCI->setSourceAlignment(SrcAlign);

return CI;		return CI;
}		}

CallInst *IRBuilderBase::CreateElementUnorderedAtomicMemCpy(		CallInst *IRBuilderBase::CreateElementUnorderedAtomicMemCpy(
Value Dst, unsigned DstAlign, Value Src, unsigned SrcAlign, Value *Size,		Value Dst, unsigned DstAlign, Value Src, unsigned SrcAlign, Value *Size,
uint32_t ElementSize, MDNode TBAATag, MDNode TBAAStructTag,		uint32_t ElementSize, MDNode TBAATag, MDNode TBAAStructTag,
MDNode ScopeTag, MDNode NoAliasTag) {		MDNode ScopeTag, MDNode NoAliasTag) {
assert(DstAlign >= ElementSize &&		assert(DstAlign >= ElementSize &&
Show All 39 Lines	CreateMemMove(Value Dst, unsigned DstAlign, Value Src, unsigned SrcAlign,
MDNode *NoAliasTag) {		MDNode *NoAliasTag) {
assert((DstAlign == 0 \|\| isPowerOf2_32(DstAlign)) && "Must be 0 or a power of 2");		assert((DstAlign == 0 \|\| isPowerOf2_32(DstAlign)) && "Must be 0 or a power of 2");
assert((SrcAlign == 0 \|\| isPowerOf2_32(SrcAlign)) && "Must be 0 or a power of 2");		assert((SrcAlign == 0 \|\| isPowerOf2_32(SrcAlign)) && "Must be 0 or a power of 2");
Dst = getCastedInt8PtrValue(Dst);		Dst = getCastedInt8PtrValue(Dst);
Src = getCastedInt8PtrValue(Src);		Src = getCastedInt8PtrValue(Src);

Value *Ops[] = {Dst, Src, Size, getInt1(isVolatile)};		Value *Ops[] = {Dst, Src, Size, getInt1(isVolatile)};
Type *Tys[] = { Dst->getType(), Src->getType(), Size->getType() };		Type *Tys[] = { Dst->getType(), Src->getType(), Size->getType() };
Module *M = BB->getParent()->getParent();		Function *F = BB->getParent();
		Module *M = F->getParent();
Function *TheFn = Intrinsic::getDeclaration(M, Intrinsic::memmove, Tys);		Function *TheFn = Intrinsic::getDeclaration(M, Intrinsic::memmove, Tys);

CallInst *CI = createCallHelper(TheFn, Ops, this);		CallInst *CI = createCallHelper(TheFn, Ops, this);

auto *MMI = cast<MemMoveInst>(CI);		auto *MMI = cast<MemMoveInst>(CI);
if (DstAlign > 0)		if (DstAlign > 0)
MMI->setDestAlignment(DstAlign);		MMI->setDestAlignment(DstAlign);
if (SrcAlign > 0)		if (SrcAlign > 0)
MMI->setSourceAlignment(SrcAlign);		MMI->setSourceAlignment(SrcAlign);

// Set the TBAA info if present.		// Set the TBAA info if present.
if (TBAATag)		if (TBAATag)
CI->setMetadata(LLVMContext::MD_tbaa, TBAATag);		CI->setMetadata(LLVMContext::MD_tbaa, TBAATag);

if (ScopeTag)		if (ScopeTag)
CI->setMetadata(LLVMContext::MD_alias_scope, ScopeTag);		CI->setMetadata(LLVMContext::MD_alias_scope, ScopeTag);

if (NoAliasTag)		if (NoAliasTag)
CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);		CI->setMetadata(LLVMContext::MD_noalias, NoAliasTag);

		ForwardAttribute(F, "no-builtin-memmove", CI);

return CI;		return CI;
}		}

CallInst *IRBuilderBase::CreateElementUnorderedAtomicMemMove(		CallInst *IRBuilderBase::CreateElementUnorderedAtomicMemMove(
Value Dst, unsigned DstAlign, Value Src, unsigned SrcAlign, Value *Size,		Value Dst, unsigned DstAlign, Value Src, unsigned SrcAlign, Value *Size,
uint32_t ElementSize, MDNode TBAATag, MDNode TBAAStructTag,		uint32_t ElementSize, MDNode TBAATag, MDNode TBAAStructTag,
MDNode ScopeTag, MDNode NoAliasTag) {		MDNode ScopeTag, MDNode NoAliasTag) {
assert(DstAlign >= ElementSize &&		assert(DstAlign >= ElementSize &&
▲ Show 20 Lines • Show All 480 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86SelectionDAGInfo.cpp

Show First 20 Lines • Show All 308 Lines • ▼ Show 20 Lines	SDValue X86SelectionDAGInfo::EmitTargetCodeForMemcpy(

/// Handle constant sizes,		/// Handle constant sizes,
if (ConstantSDNode *ConstantSize = dyn_cast<ConstantSDNode>(Size))		if (ConstantSDNode *ConstantSize = dyn_cast<ConstantSDNode>(Size))
return emitConstantSizeRepmov(DAG, Subtarget, dl, Chain, Dst, Src,		return emitConstantSizeRepmov(DAG, Subtarget, dl, Chain, Dst, Src,
ConstantSize->getZExtValue(),		ConstantSize->getZExtValue(),
Size.getValueType(), Align, isVolatile,		Size.getValueType(), Align, isVolatile,
AlwaysInline, DstPtrInfo, SrcPtrInfo);		AlwaysInline, DstPtrInfo, SrcPtrInfo);

		/// Handle runtime sizes through repmovsb when we AlwaysInline.
		if (AlwaysInline)
		return emitRepmovs(Subtarget, DAG, dl, Chain, Dst, Src, Size, MVT::i8);

return SDValue();		return SDValue();
}		}

llvm/test/CodeGen/X86/memcpy-inline.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=core2 \| FileCheck %s

				declare void @llvm.memcpy.inline.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind

				define void @test1(i8* %a, i8* %b) nounwind {
				; CHECK-LABEL: test1:
				; CHECK: # %bb.0:
				; CHECK-NEXT: movq (%rsi), %rax
				; CHECK-NEXT: movq %rax, (%rdi)
				; CHECK-NEXT: retq
				tail call void @llvm.memcpy.inline.p0i8.p0i8.i64(i8* %a, i8* %b, i64 8, i1 0 )
				ret void
				}

llvm/test/CodeGen/X86/memcpy.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=core2 \| FileCheck %s -check-prefix=LINUX			; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=core2 \| FileCheck %s -check-prefix=LINUX
	; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=core2 \| FileCheck %s -check-prefix=DARWIN			; RUN: llc < %s -mtriple=x86_64-apple-darwin -mcpu=core2 \| FileCheck %s -check-prefix=DARWIN

	declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind			declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i1) nounwind
	declare void @llvm.memcpy.p256i8.p256i8.i64(i8 addrspace(256)* nocapture, i8 addrspace(256)* nocapture, i64, i1) nounwind			declare void @llvm.memcpy.p256i8.p256i8.i64(i8 addrspace(256)* nocapture, i8 addrspace(256)* nocapture, i64, i1) nounwind


	; Variable memcpy's should lower to calls.			; Variable memcpy's should lower to calls.
	define i8* @test1(i8* %a, i8* %b, i64 %n) nounwind {			define void @test1(i8* %a, i8* %b, i64 %n) nounwind {
	; LINUX-LABEL: test1:			; LINUX-LABEL: test1:
	; LINUX: # %bb.0: # %entry			; LINUX: # %bb.0: # %entry
	; LINUX-NEXT: jmp memcpy # TAILCALL			; LINUX-NEXT: jmp memcpy # TAILCALL
	;			;
	; DARWIN-LABEL: test1:			; DARWIN-LABEL: test1:
	; DARWIN: ## %bb.0: ## %entry			; DARWIN: ## %bb.0: ## %entry
	; DARWIN-NEXT: jmp _memcpy ## TAILCALL			; DARWIN-NEXT: jmp _memcpy ## TAILCALL
	entry:			entry:
	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 %n, i1 0 )			tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 %n, i1 0 )
	ret i8* %a			ret void
	}			}

	; Variable memcpy's should lower to calls.			; Variable memcpy's should lower to calls.
	define i8* @test2(i64* %a, i64* %b, i64 %n) nounwind {			define void @test2(i64* %a, i64* %b, i64 %n) nounwind {
	; LINUX-LABEL: test2:			; LINUX-LABEL: test2:
	; LINUX: # %bb.0: # %entry			; LINUX: # %bb.0: # %entry
	; LINUX-NEXT: jmp memcpy # TAILCALL			; LINUX-NEXT: jmp memcpy # TAILCALL
	;			;
	; DARWIN-LABEL: test2:			; DARWIN-LABEL: test2:
	; DARWIN: ## %bb.0: ## %entry			; DARWIN: ## %bb.0: ## %entry
	; DARWIN-NEXT: jmp _memcpy ## TAILCALL			; DARWIN-NEXT: jmp _memcpy ## TAILCALL
	entry:			entry:
	%tmp14 = bitcast i64* %a to i8*			%tmp14 = bitcast i64* %a to i8*
	%tmp25 = bitcast i64* %b to i8*			%tmp25 = bitcast i64* %b to i8*
	tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %tmp14, i8* align 8 %tmp25, i64 %n, i1 0 )			tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 8 %tmp14, i8* align 8 %tmp25, i64 %n, i1 0 )
	ret i8* %tmp14			ret void
				}

				; Variable length memcpy's with disabled runtime should lower to repmovsb.
				define void @memcpy_no_runtime(i8* %a, i8* %b, i64 %n) nounwind {
				; LINUX-LABEL: memcpy_no_runtime:
				; LINUX: # %bb.0: # %entry
				; LINUX-NEXT: movq %rdx, %rcx
				; LINUX-NEXT: rep;movsb (%rsi), %es:(%rdi)
				; LINUX-NEXT: retq
				;
				; DARWIN-LABEL: memcpy_no_runtime:
				; DARWIN: ## %bb.0: ## %entry
				; DARWIN-NEXT: movq %rdx, %rcx
				; DARWIN-NEXT: rep;movsb (%rsi), %es:(%rdi)
				; DARWIN-NEXT: retq
				entry:
				tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 %n, i1 0 ) "no-builtin-memcpy"
				ret void
	}			}

	; Large constant memcpy's should lower to a call when optimizing for size.			; Large constant memcpy's should lower to a call when optimizing for size.
	; PR6623			; PR6623

	; On the other hand, Darwin's definition of -Os is optimizing for size without			; On the other hand, Darwin's definition of -Os is optimizing for size without
	; hurting performance so it should just ignore optsize when expanding memcpy.			; hurting performance so it should just ignore optsize when expanding memcpy.
	; rdar://8821501			; rdar://8821501
	▲ Show 20 Lines • Show All 200 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[clang/llvm] Allow efficient implementation of libc's memory functions in C/C++AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 203386

clang/include/clang/Basic/Attr.td

clang/include/clang/Basic/AttrDocs.td

clang/include/clang/Basic/Builtins.def

clang/lib/CodeGen/CGBuilder.h

clang/lib/CodeGen/CGBuiltin.cpp

clang/lib/CodeGen/CGCall.cpp

clang/lib/Sema/SemaDeclAttr.cpp

llvm/docs/LangRef.rst

llvm/include/llvm/IR/IRBuilder.h

llvm/include/llvm/IR/IntrinsicInst.h

llvm/include/llvm/IR/Intrinsics.td

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/IR/IRBuilder.cpp

llvm/lib/Target/X86/X86SelectionDAGInfo.cpp

llvm/test/CodeGen/X86/memcpy-inline.ll

llvm/test/CodeGen/X86/memcpy.ll

[clang/llvm] Allow efficient implementation of libc's memory functions in C/C++
AbandonedPublic