This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
BuildLibCalls.h
-
lib/
-
Analysis/
1/1
TargetLibraryInfo.cpp
-
Transforms/Utils/
-
Utils/
-
BuildLibCalls.cpp
1/1
SimplifyLibCalls.cpp
-
test/
-
CodeGen/X86/
-
X86/
-
memcmp.ll
-
Transforms/
-
InferFunctionAttrs/
-
annotate.ll
-
InstCombine/
-
memcmp-1.ll
-
tools/llvm-exegesis/lib/
-
llvm-exegesis/
-
lib/
-
Clustering.cpp

Differential D56593

[SelectionDAG][RFC] Allow the user to specify a memeq function (v5).
ClosedPublic

Authored by courbet on Jan 11 2019, 4:45 AM.

Download Raw Diff

Details

Reviewers

jyknight
hfinkel

Commits

rG8e16d73346f8: [SelectionDAG] Allow the user to specify a memeq function.
rL355672: [SelectionDAG] Allow the user to specify a memeq function.

Summary

Right now, when we encounter a string equality check,
e.g. if (memcmp(a, b, s) == 0), we try to expand to a comparison if s is a
small compile-time constant, and fall back on calling memcmp() else.

This is sub-optimal because memcmp has to compute much more than
equality.

This patch replaces memcmp(a, b, s) == 0 by bcmp(a, b, s) == 0 on platforms
that support bcmp.

bcmp can be made much more efficient than memcmp because equality
compare is trivially parallel while lexicographic ordering has a chain
dependency.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 27733
Build 27732: arc lint + arc unit

Event Timeline

courbet created this revision.Jan 11 2019, 4:45 AM

Herald added a subscriber: fedor.sergeev. · View Herald TranscriptJan 11 2019, 4:45 AM

Harbormaster completed remote builds in B26694: Diff 181249.Jan 11 2019, 4:45 AM

courbet added a reviewer: jyknight.Feb 4 2019, 11:26 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 4 2019, 11:26 PM

courbet added a reviewer: hfinkel.Feb 4 2019, 11:27 PM

xbolva00 added a subscriber: xbolva00.Feb 4 2019, 11:59 PM

xbolva00 added inline comments.

lib/Transforms/Utils/SimplifyLibCalls.cpp
917	Move code and create function emitBcmp in BuildLibCalls

gchatelet added inline comments.Feb 5 2019, 1:50 AM

lib/Analysis/TargetLibraryInfo.cpp
54	typo 'the the'

Refactor emitBCmp() to BuildLibCalls.

Thanks for the comments.

Harbormaster completed remote builds in B27733: Diff 185288.Feb 5 2019, 5:31 AM

rebase

Harbormaster completed remote builds in B27734: Diff 185289.Feb 5 2019, 5:46 AM

Ping.

It'd be great to see this somewhat more widely publicized, outside of just the clang community. If libc implementors are aware of the gains and are willing to provide an actually-faster bcmp implementation, it'd be a lot better, than having this optimization that doesn't really optimize anything without users providing their own bcmp implementation.

Given the potential for gains reported, I'd hate to see this as a change that people can't actually take advantage of.

A couple things I'd worry about, which I think this change is doing properly, but just to double check:

I assume that with -ffreestanding, this will be disabled.
Some folks avoid -ffreestanding, even though they have a freestanding implementation (sigh). For them, I assume -fno-builtin=bcmp will also disable this.

We should document this change for such folk, as they will need to either add the flag, or provide their own bcmp implementation.

Some other transforms in SimplifyLibCalls transform strcmp and strncmp into memcmp. I'm not sure if these optimizations will iterate or not -- will this properly transform strcmp -> memcmp -> bcmp, where appropriate?

Finally, I note that we don't optimize user code which calls bcmp the way we do user code which calls memcmp. Neither in ExpandMemCmp, or SimplifyLibCalls do we handle bcmp. While that's not something that needs to be simultaneously with this change, probably we should be doing so.

In D56593#1393325, @jyknight wrote:

It'd be great to see this somewhat more widely publicized, outside of just the clang community. If libc implementors are aware of the gains and are willing to provide an actually-faster bcmp implementation, it'd be a lot better, than having this optimization that doesn't really optimize anything without users providing their own bcmp implementation.

I agree. That would be great to see this potential gain converted into an actual real gain for everybody.
I'm working on the optimized internal bcmp, we're still discussing how to contribute this to glibc (since implementation relies heavily on C++ features)

A couple things I'd worry about, which I think this change is doing properly, but just to double check:

I assume that with -ffreestanding, this will be disabled.

Some folks avoid -ffreestanding, even though they have a freestanding implementation (sigh). For them, I assume -fno-builtin=bcmp will also disable this.

Unfortunately using LLVM's -ffreestanding doesn't seems to conform to the standard, so it's already broken in a way.
Clang considers mem* to be a required part of freestanding. Using -nostdlib and -ffreestanding doesn't mean that Clang and LLVM won't use those functions, it means that you're responsible for providing them.
e.g. passing a big struct by value with -ffreestanding will call memcpy in both C and C++

That said, I agree it's better to disable bcmp for -ffreestanding.

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 12 2019, 1:35 AM

Add test to show that simplifylibcall composes (strcmp -> memcmp -> bcmp).

Harbormaster completed remote builds in B28044: Diff 186426.Feb 12 2019, 2:21 AM

In D56593#1393325, @jyknight wrote:

It'd be great to see this somewhat more widely publicized, outside of just the clang community. If libc implementors are aware of the gains and are willing to provide an actually-faster bcmp implementation, it'd be a lot better, than having this optimization that doesn't really optimize anything without users providing their own bcmp implementation.

Given the potential for gains reported, I'd hate to see this as a change that people can't actually take advantage of.

A couple things I'd worry about, which I think this change is doing properly, but just to double check:

I assume that with -ffreestanding, this will be disabled.

Some folks avoid -ffreestanding, even though they have a freestanding implementation (sigh). For them, I assume -fno-builtin=bcmp will also disable this.

We should document this change for such folk, as they will need to either add the flag, or provide their own bcmp implementation.

While compiling the following code:

return memcmp(a, b, i) == 0;

1 - -ffreestanding will emit memcmp() instead of bcmp(), because freestanding disables bcmp() as a library function.
2 - -fno-builtin-memcmp will emit memcmp() instead of bcmp(), because this one instructs LLVM to not treat calls to memcmp() specially.
3 - -fno-builtin-bcmp will emit bcmp(). AFAICT, this is reasonable because this flag is supposed to tell clang/llvm to not treat bcmp specially, not to avoid using it (which is what freestanding is). That being said, I can see why the behaviour you're suggesting is good for users, so we probably want to teach clang to treat bcmp as a builtin: https://reviews.llvm.org/D58120

Some other transforms in SimplifyLibCalls transform strcmp and strncmp into memcmp. I'm not sure if these optimizations will iterate or not -- will this properly transform strcmp -> memcmp -> bcmp, where appropriate?

They do compose, I've added a test to show that.

Finally, I note that we don't optimize user code which calls bcmp the way we do user code which calls memcmp. Neither in ExpandMemCmp, or SimplifyLibCalls do we handle bcmp. While that's not something that needs to be simultaneously with this change, probably we should be doing so.

That's a very good point. I've created PR40699 with this.

OK, I'm happy with all of that.

One more issue I just thought of: I believe this will reduce sanitizer coverage, since they intercept calls to memcmp, but not (yet) bcmp.

In D56593#1396922, @jyknight wrote:

OK, I'm happy with all of that.

One more issue I just thought of: I believe this will reduce sanitizer coverage, since they intercept calls to memcmp, but not (yet) bcmp.

Thanks , I've created D58379 for this.

MaskRay added a subscriber: MaskRay.Feb 19 2019, 6:52 PM

D58379 Is now submitted.

LGTM.

Probably worth adding a note to the release notes, something like "The optimizer will now convert calls to memcmp into a calls to bcmp in some circumstances. Users who are building freestanding code (not depending on the platform's libc) without specifying -ffreestanding may need to either pass -fno-builtin-bcmp, or provide a bcmp function."

This revision is now accepted and ready to land.Feb 26 2019, 6:37 AM

update realease notes

Herald added subscribers: Petar.Avramovic, kristina, atanasyan and 3 others. · View Herald TranscriptFeb 26 2019, 6:57 AM

Done, thanks for the review !

Harbormaster completed remote builds in B28550: Diff 188369.Feb 26 2019, 6:59 AM

Closed by commit rL355672: [SelectionDAG] Allow the user to specify a memeq function. (authored by courbet). · Explain WhyMar 8 2019, 1:07 AM

This revision was automatically updated to reflect the committed changes.

This made pdfium use 6.8% more cpu, https://bugs.chromium.org/p/chromium/issues/detail?id=947611

Since the intent here was to make things faster, any ideas how this could happen?

We should avoid this transformation for glibc environment..

https://github.com/ClangBuiltLinux/linux/issues/416#issuecomment-471250791

In D56593#1448298, @thakis wrote:

This made pdfium use 6.8% more cpu, https://bugs.chromium.org/p/chromium/issues/detail?id=947611

Since the intent here was to make things faster, any ideas how this could happen?

Hi Nico, thanks for reporting.

This commit introduced a regression (https://bugs.llvm.org/show_bug.cgi?id=40699): memcmp with small constant sizes, e.g. memcmp(a, b, 16) == 0, was expanded as bcmp(a, b, 16) == 0 instead of being lowered to two loads + 1 compare.

This was fixed in https://reviews.llvm.org/rL356550

Could that be the reason for the regression ? I've tried reproducing with the commands in https://bugs.chromium.org/p/chromium/issues/detail?id=947611, but my valgrind will not handle AVX instructions:

valgrind: Unrecognised instruction at address 0x4015cc9
   at 0x4015CC9: _dl_runtime_resolve_avx_slow
...

Could that be the reason for the regression ?

Though I cannot run callgrind, I had a look at the code with/without 356550. Before the change, I see a lot of small constant-size calls to bcmp (~20 of them), e.g.:

leaq -0xa0(%rbp), %rbx
...
movl $0x20, %edx          
movq %rbx, %rdi           
leaq -0xe0(%rbp), %rsi    
callq 0x1954c7

After the change, these are inlined:

movdqa -0x80(%rbp), %xmm0 
movdqa -0x70(%rbp), %xmm1 
pcmpeqb -0xd0(%rbp), %xmm1
pcmpeqb -0xe0(%rbp), %xmm0
pand %xmm1, %xmm0         
pmovmskb %xmm0, %ecx      
cmpl $0xffff, %ecx

OK, I succeeded in profiling this using pprof instead of callgrind.

Again, before 356550 I see:

210764079 32.23% 32.23%  210764079 32.23%  CStretchEngine::ContinueStretchHorz
  98749133 15.10% 47.33%   98749133 15.10%  (anonymous namespace)::FaxG4GetRow
  76895485 11.76% 59.09%   76895485 11.76%  <unknown>
  72290505 11.06% 70.15%   72290505 11.06%  (anonymous namespace)::FindBit
  24300088  3.72% 73.86%   24300088  3.72%  CFX_ScanlineCompositor::CompositeByteMaskLine
  21546869  3.30% 77.16%   21546869  3.30%  CStretchEngine::StretchVert
  17315093  2.65% 79.81%   17315093  2.65%  [pdfium_test]
  16493189  2.52% 82.33%   16493189  2.52%  __memcmp_sse4_1
   8353611  1.28% 83.61%    8353611  1.28%  CStretchEngine::CWeightTable::Calc
   7791739  1.19% 84.80%    7791739  1.19%  CPDF_DIBBase::TranslateScanline24bppDefaultDecode

and after the change:

195287006 32.16% 32.16%  195287006 32.16%  CStretchEngine::ContinueStretchHorz
 85942976 14.15% 46.31%   85942976 14.15%  (anonymous namespace)::FaxG4GetRow
 75448209 12.42% 58.74%   75448209 12.42%  <unknown>
 64249874 10.58% 69.32%   64249874 10.58%  (anonymous namespace)::FindBit
 32990212  5.43% 74.75%   32990212  5.43%  CStretchEngine::StretchVert
 14766318  2.43% 77.18%   14766318  2.43%  CFX_ScanlineCompositor::CompositeByteMaskLine
 13827062  2.28% 79.46%   13827062  2.28%  [pdfium_test]
  9502572  1.56% 81.02%    9502572  1.56%  CPDF_DIBBase::TranslateScanline24bppDefaultDecode
  6078561  1.00% 82.02%    6078561  1.00%  CStretchEngine::CWeightTable::Calc
  5193562  0.86% 82.88%    5193562  0.86%  CPDF_TextPage::ProcessTextObject

So I'm now quite confident that de-inlining was the cause of the regression and that it was fixed by 356550.

brzycki mentioned this in D64805: [AArch64] Expand bcmp() for small comparisons.Jul 16 2019, 9:20 AM

evandro mentioned this in rL367898: [AArch64] Expand bcmp() for small block lengths.Aug 5 2019, 11:10 AM

evandro mentioned this in rGa005c1ac4f3b: [AArch64] Expand bcmp() for small block lengths.

hans mentioned this in rL368017: Merging r367898:.Aug 6 2019, 4:04 AM

hansw mentioned this in rG13c43456a9a8: Merging r367898: --------------------------------------------------------------….Aug 6 2019, 4:04 AM

In D56593#1393325, @jyknight wrote:

It'd be great to see this somewhat more widely publicized, outside of just the clang community. If libc implementors are aware of the gains and are willing to provide an actually-faster bcmp implementation, it'd be a lot better, than having this optimization that doesn't really optimize anything without users providing their own bcmp implementation.

Tried to add an optimized bcmp for GLIBC here.

It was not well received because bcmp is not a standard function. It seems GLIBC supports bcmp out of reluctant necessity rather than any desire for it to be fast.

There was agreement that the functionality was useful so GLIBC landed on __memcmpeq'to get the functionality. The patches made it to HEAD
and will be available starting with the 2.35 release. It declared in "string.h" or can be queried with GLIBC version >= 2.35.

Currently only x86_64 has an optimized version, the rest of the targets still just redirect to memcmp.

Working on a patch to add support in LLVM.

Given the potential for gains reported, I'd hate to see this as a change that people can't actually take advantage of.

A couple things I'd worry about, which I think this change is doing properly, but just to double check:

I assume that with -ffreestanding, this will be disabled.

Some folks avoid -ffreestanding, even though they have a freestanding implementation (sigh). For them, I assume -fno-builtin=bcmp will also disable this.

We should document this change for such folk, as they will need to either add the flag, or provide their own bcmp implementation.

Some other transforms in SimplifyLibCalls transform strcmp and strncmp into memcmp. I'm not sure if these optimizations will iterate or not -- will this properly transform strcmp -> memcmp -> bcmp, where appropriate?

Finally, I note that we don't optimize user code which calls bcmp the way we do user code which calls memcmp. Neither in ExpandMemCmp, or SimplifyLibCalls do we handle bcmp. While that's not something that needs to be simultaneously with this change, probably we should be doing so.

Herald added subscribers: ecnelises, pengfei. · View Herald TranscriptNov 6 2021, 11:26 AM

In D56593#3113701, @goldstein.w.n wrote:

In D56593#1393325, @jyknight wrote:

It'd be great to see this somewhat more widely publicized, outside of just the clang community. If libc implementors are aware of the gains and are willing to provide an actually-faster bcmp implementation, it'd be a lot better, than having this optimization that doesn't really optimize anything without users providing their own bcmp implementation.

Tried to add an optimized bcmp for GLIBC here.

It was not well received because bcmp is not a standard function. It seems GLIBC supports bcmp out of reluctant necessity rather than any desire for it to be fast.

There was agreement that the functionality was useful so GLIBC landed on __memcmpeq'to get the functionality. The patches made it to HEAD
and will be available starting with the 2.35 release. It declared in "string.h" or can be queried with GLIBC version >= 2.35.

Currently only x86_64 has an optimized version, the rest of the targets still just redirect to memcmp.

Working on a patch to add support in LLVM.

I understand and accept the viewpoint that: for glibc, bumping the symbol version for bcmp may lead to difficuly-to-debug issues when users try to upgrade glibc.
An ABI-only symbol in the reserved namespace looks good to me.

AFAIK while LLVM has Triple::isOSLinux and Triple::isGNUEnvironment and there are optimizations dispatching on linux-gnu there is no way detecting the version. (well, Apple platforms and some other OSes have such checks)
The optimizations just assume that a very old glibc version may be used (perhaps 2.10+ or early 2.20+).
In addition, there are many cross compiling users where no version is specified.

The glibc 2.35 availability does not make this optimization exploitable, at least for few (5+?) years :)

The original bcmp transformation does make me wonder whether it is a suitable enabled-by-default optimization if its existence is due to reluctance and neither glibc nor musl actually makes it faster than memcmp.
There are also concerns switching from 3-way to 2-way for bcmp even if its deprecated specification only promises 2-way.
I understand that libc/src/string/bcmp.cpp leverages 2-way and some users may override glibc functions with that implementation, but whether this use case is sufficient to change the more broad Linux default is debatable.

One option is to hide such transformation under a cl::opt option.

In D56593#3113727, @MaskRay wrote:

In D56593#3113701, @goldstein.w.n wrote:

In D56593#1393325, @jyknight wrote:

It'd be great to see this somewhat more widely publicized, outside of just the clang community. If libc implementors are aware of the gains and are willing to provide an actually-faster bcmp implementation, it'd be a lot better, than having this optimization that doesn't really optimize anything without users providing their own bcmp implementation.

Tried to add an optimized bcmp for GLIBC here.

It was not well received because bcmp is not a standard function. It seems GLIBC supports bcmp out of reluctant necessity rather than any desire for it to be fast.

There was agreement that the functionality was useful so GLIBC landed on __memcmpeq'to get the functionality. The patches made it to HEAD
and will be available starting with the 2.35 release. It declared in "string.h" or can be queried with GLIBC version >= 2.35.

Currently only x86_64 has an optimized version, the rest of the targets still just redirect to memcmp.

Working on a patch to add support in LLVM.

I understand and accept the viewpoint that: for glibc, bumping the symbol version for bcmp may lead to difficuly-to-debug issues when users try to upgrade glibc.
An ABI-only symbol in the reserved namespace looks good to me.

AFAIK while LLVM has Triple::isOSLinux and Triple::isGNUEnvironment and there are optimizations dispatching on linux-gnu there is no way detecting the version. (well, Apple platforms and some other OSes have such checks)
The optimizations just assume that a very old glibc version may be used (perhaps 2.10+ or early 2.20+).
In addition, there are many cross compiling users where no version is specified.

I was planing to add Triple::isGNUVersionLT and use it in a function hasMemcmpeq in a similar vein to hasBcmp. Will there be an issue with that?

The glibc 2.35 availability does not make this optimization exploitable, at least for few (5+?) years :)

Indeed :/

One option is to hide such transformation under a cl::opt option.

Sorry not sure I understand, can you explain?

In D56593#3113746, @goldstein.w.n wrote:

In D56593#3113727, @MaskRay wrote:

In D56593#3113701, @goldstein.w.n wrote:

In D56593#1393325, @jyknight wrote:

It'd be great to see this somewhat more widely publicized, outside of just the clang community. If libc implementors are aware of the gains and are willing to provide an actually-faster bcmp implementation, it'd be a lot better, than having this optimization that doesn't really optimize anything without users providing their own bcmp implementation.

Tried to add an optimized bcmp for GLIBC here.

It was not well received because bcmp is not a standard function. It seems GLIBC supports bcmp out of reluctant necessity rather than any desire for it to be fast.

There was agreement that the functionality was useful so GLIBC landed on __memcmpeq'to get the functionality. The patches made it to HEAD
and will be available starting with the 2.35 release. It declared in "string.h" or can be queried with GLIBC version >= 2.35.

Currently only x86_64 has an optimized version, the rest of the targets still just redirect to memcmp.

Working on a patch to add support in LLVM.

I understand and accept the viewpoint that: for glibc, bumping the symbol version for bcmp may lead to difficuly-to-debug issues when users try to upgrade glibc.
An ABI-only symbol in the reserved namespace looks good to me.

AFAIK while LLVM has Triple::isOSLinux and Triple::isGNUEnvironment and there are optimizations dispatching on linux-gnu there is no way detecting the version. (well, Apple platforms and some other OSes have such checks)
The optimizations just assume that a very old glibc version may be used (perhaps 2.10+ or early 2.20+).
In addition, there are many cross compiling users where no version is specified.

I was planing to add Triple::isGNUVersionLT and use it in a function hasMemcmpeq in a similar vein to hasBcmp. Will there be an issue with that?

So the version parsing doesn't appear to work out of the box for GLIBC and your right that it will miss cases (cross compilation as you pointed out) as well as free standing compilation or libraries like cpu-rt.

Do you know if there is any way to check for the __memcmpeq declaration? We put it in string.h particularly because the issue of how to know when this optimization is valid came up when making the proposal.

As well, does having the optimization attached to the declaration make sense? It may still need to be attached to isGNUEnvironment in case another libc implementation has __memcmpeq with slightly different semantics.

The glibc 2.35 availability does not make this optimization exploitable, at least for few (5+?) years :)

Indeed :/

One option is to hide such transformation under a cl::opt option.

Sorry not sure I understand, can you explain?

See what you mean and why it might be necessary.

goldstein.w.n mentioned this in D127461: [SelectionDAG] Use __memcmpeq to replace bcmp and bool usage memcmp.Jun 9 2022, 5:09 PM

@courbet Any followup as we have got "'__memcmpeq" ?

Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2022, 10:33 AM

Herald added a subscriber: jsji. · View Herald Transcript

See what you mean and why it might be necessary.

Please see https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596335.html

In D56593#3582442, @xbolva00 wrote:

@courbet Any followup as we have got "'__memcmpeq" ?

I was under the impression that bcmp was redirecting to __memcmpeq when the latter was available. Isn't that sufficient ?

I don't think we can call __memcmpeq directly because this will not work on older versions of the glibc (as discussed above).

In D56593#3582442, @xbolva00 wrote:

@courbet Any followup as we have got "'__memcmpeq" ?

Hi I have an equivalent patch for LLVM: https://reviews.llvm.org/D127461

In D56593#3584237, @courbet wrote:

In D56593#3582442, @xbolva00 wrote:

@courbet Any followup as we have got "'__memcmpeq" ?

I was under the impression that bcmp was redirecting to __memcmpeq when the latter was available. Isn't that sufficient ?

It is not.

This was rejected for the same reasons the bcmp optimizations where rejected. bcmp is not standardized and since
glibc has been maintaining it as a true memcmp (that can be used for sort), there is not reason for users to not have
begun to rely on that behavior.

I don't think we can call __memcmpeq directly because this will not work on older versions of the glibc (as discussed above).

I have a patch here: https://reviews.llvm.org/D127461

The way I set it up its guarded behind a CLI option. I think the direction GCC plans to head
is to enable the optimization if there is a declaration of "__memcmpeq".

In D56593#3585621, @goldstein.w.n wrote:

In D56593#3584237, @courbet wrote:

In D56593#3582442, @xbolva00 wrote:

@courbet Any followup as we have got "'__memcmpeq" ?

I was under the impression that bcmp was redirecting to __memcmpeq when the latter was available. Isn't that sufficient ?

It is not.

This was rejected for the same reasons the bcmp optimizations where rejected. bcmp is not standardized and since
glibc has been maintaining it as a true memcmp (that can be used for sort), there is not reason for users to not have
begun to rely on that behavior.

This is kinda sad, but OK.

I don't think we can call __memcmpeq directly because this will not work on older versions of the glibc (as discussed above).

I have a patch here: https://reviews.llvm.org/D127461

I personally don't have a strong opinion, but I know that during the initial RFC some people were against adding a backend flag for this (https://reviews.llvm.org/D56248): see the discussion here: https://lists.llvm.org/pipermail/llvm-dev/2019-January/128827.html.

In D56593#3585665, @courbet wrote:

In D56593#3585621, @goldstein.w.n wrote:

In D56593#3584237, @courbet wrote:

In D56593#3582442, @xbolva00 wrote:

@courbet Any followup as we have got "'__memcmpeq" ?

I was under the impression that bcmp was redirecting to __memcmpeq when the latter was available. Isn't that sufficient ?

It is not.

This was rejected for the same reasons the bcmp optimizations where rejected. bcmp is not standardized and since
glibc has been maintaining it as a true memcmp (that can be used for sort), there is not reason for users to not have
begun to rely on that behavior.

This is kinda sad, but OK.

Indeed. The original patch that kicked this off was just to optimize bcmp so the LLVM optimization
would be meaningful. Cest la vie.

I don't think we can call __memcmpeq directly because this will not work on older versions of the glibc (as discussed above).

I have a patch here: https://reviews.llvm.org/D127461

I personally don't have a strong opinion, but I know that during the initial RFC some people were against adding a backend flag for this (https://reviews.llvm.org/D56248): see the discussion here: https://lists.llvm.org/pipermail/llvm-dev/2019-January/128827.html.

I guess we can go with the flag being used by gcc: -fextra-libc-function=__memcmpeq.

I guess we can go with the flag being used by gcc: -fextra-libc-function=__memcmpeq.

maybe allow also form : -fextra-libc-function=fn1,fn2,fn2 so user can enable newer functions from never glibc.

In D56593#3585700, @goldstein.w.n wrote:

I guess we can go with the flag being used by gcc: -fextra-libc-function=__memcmpeq.

Can't that just be spelled -fbuiltin-__memcmpeq?

Seems fine as well.

Maybe there should be a RFC at Discourse ? ..

In D56593#3586718, @xbolva00 wrote:

Seems fine as well.

Maybe there should be a RFC at Discourse ? ..

Looks to be headed that way. On the __memcmpeq patches review there is a bit more discussion. If no resolution
soon then yeah an RFC.

In D56593#3586673, @jyknight wrote:

In D56593#3585700, @goldstein.w.n wrote:

I guess we can go with the flag being used by gcc: -fextra-libc-function=__memcmpeq.

Can't that just be spelled -fbuiltin-__memcmpeq?

GCC isn't going with -fextra-libc-function=.... They are going to just do a system header check.

Revision Contents

Path

Size

include/

llvm/

Transforms/

Utils/

BuildLibCalls.h

4 lines

lib/

Analysis/

TargetLibraryInfo.cpp

13 lines

Transforms/

Utils/

BuildLibCalls.cpp

33 lines

SimplifyLibCalls.cpp

41 lines

test/

CodeGen/

X86/

memcmp.ll

67 lines

Transforms/

InferFunctionAttrs/

annotate.ll

9 lines

InstCombine/

memcmp-1.ll

21 lines

tools/

llvm-exegesis/

lib/

Clustering.cpp

68 lines

Diff 185288

include/llvm/Transforms/Utils/BuildLibCalls.h

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	namespace llvm {
/// Val is an i32 value, and Len is an 'intptr_t' value.		/// Val is an i32 value, and Len is an 'intptr_t' value.
Value emitMemChr(Value Ptr, Value Val, Value Len, IRBuilder<> &B,		Value emitMemChr(Value Ptr, Value Val, Value Len, IRBuilder<> &B,
const DataLayout &DL, const TargetLibraryInfo *TLI);		const DataLayout &DL, const TargetLibraryInfo *TLI);

/// Emit a call to the memcmp function.		/// Emit a call to the memcmp function.
Value emitMemCmp(Value Ptr1, Value Ptr2, Value Len, IRBuilder<> &B,		Value emitMemCmp(Value Ptr1, Value Ptr2, Value Len, IRBuilder<> &B,
const DataLayout &DL, const TargetLibraryInfo *TLI);		const DataLayout &DL, const TargetLibraryInfo *TLI);

		/// Emit a call to the bcmp function.
		Value emitBCmp(Value Ptr1, Value Ptr2, Value Len, IRBuilder<> &B,
		const DataLayout &DL, const TargetLibraryInfo *TLI);

/// Emit a call to the unary function named 'Name' (e.g. 'floor'). This		/// Emit a call to the unary function named 'Name' (e.g. 'floor'). This
/// function is known to take a single of type matching 'Op' and returns one		/// function is known to take a single of type matching 'Op' and returns one
/// value with the same type. If 'Op' is a long double, 'l' is added as the		/// value with the same type. If 'Op' is a long double, 'l' is added as the
/// suffix of name, if 'Op' is a float, we add a 'f' suffix.		/// suffix of name, if 'Op' is a float, we add a 'f' suffix.
Value emitUnaryFloatFnCall(Value Op, StringRef Name, IRBuilder<> &B,		Value emitUnaryFloatFnCall(Value Op, StringRef Name, IRBuilder<> &B,
const AttributeList &Attrs);		const AttributeList &Attrs);

/// Emit a call to the unary function DoubleFn, FloatFn or LongDoubleFn,		/// Emit a call to the unary function DoubleFn, FloatFn or LongDoubleFn,
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

lib/Analysis/TargetLibraryInfo.cpp

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	if (T.isMacOSX() && T.isMacOSXVersionLT(10, 9))
return false;		return false;

if (T.isiOS() && T.isOSVersionLT(7, 0))		if (T.isiOS() && T.isOSVersionLT(7, 0))
return false;		return false;

return true;		return true;
}		}

		static bool hasBcmp(const Triple &TT) {
		// Posix removed support from bcmp() in 2001, but the glibc and several
		gchateletUnsubmitted Done Reply Inline Actions typo 'the the' gchatelet: typo 'the the'
		// implementations of the libc still have it.
		if (TT.isOSLinux())
		return TT.isGNUEnvironment() \|\| TT.isMusl();
		// Both NetBSD and OpenBSD are planning to remove the function. Windows does
		// not have it.
		return TT.isOSFreeBSD() \|\| TT.isOSSolaris() \|\| TT.isOSDarwin();
		}

/// Initialize the set of available library functions based on the specified		/// Initialize the set of available library functions based on the specified
/// target triple. This should be carefully written so that a missing target		/// target triple. This should be carefully written so that a missing target
/// triple gets a sane set of defaults.		/// triple gets a sane set of defaults.
static void initialize(TargetLibraryInfoImpl &TLI, const Triple &T,		static void initialize(TargetLibraryInfoImpl &TLI, const Triple &T,
ArrayRef<StringRef> StandardNames) {		ArrayRef<StringRef> StandardNames) {
// Verify that the StandardNames array is in alphabetical order.		// Verify that the StandardNames array is in alphabetical order.
assert(std::is_sorted(StandardNames.begin(), StandardNames.end(),		assert(std::is_sorted(StandardNames.begin(), StandardNames.end(),
[](StringRef LHS, StringRef RHS) {		[](StringRef LHS, StringRef RHS) {
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	if (!hasSinCosPiStret(T)) {
TLI.setUnavailable(LibFunc_sinpi);		TLI.setUnavailable(LibFunc_sinpi);
TLI.setUnavailable(LibFunc_sinpif);		TLI.setUnavailable(LibFunc_sinpif);
TLI.setUnavailable(LibFunc_cospi);		TLI.setUnavailable(LibFunc_cospi);
TLI.setUnavailable(LibFunc_cospif);		TLI.setUnavailable(LibFunc_cospif);
TLI.setUnavailable(LibFunc_sincospi_stret);		TLI.setUnavailable(LibFunc_sincospi_stret);
TLI.setUnavailable(LibFunc_sincospif_stret);		TLI.setUnavailable(LibFunc_sincospif_stret);
}		}

		if (!hasBcmp(T))
		TLI.setUnavailable(LibFunc_bcmp);

if (T.isMacOSX() && T.getArch() == Triple::x86 &&		if (T.isMacOSX() && T.getArch() == Triple::x86 &&
!T.isMacOSXVersionLT(10, 7)) {		!T.isMacOSXVersionLT(10, 7)) {
// x86-32 OSX has a scheme where fwrite and fputs (and some other functions		// x86-32 OSX has a scheme where fwrite and fputs (and some other functions
// we don't care about) have two versions; on recent OSX, the one we want		// we don't care about) have two versions; on recent OSX, the one we want
// has a $UNIX2003 suffix. The two implementations are identical except		// has a $UNIX2003 suffix. The two implementations are identical except
// for the return value in some edge cases. However, we don't want to		// for the return value in some edge cases. However, we don't want to
// generate code that depends on the old symbols.		// generate code that depends on the old symbols.
TLI.setAvailableWithName(LibFunc_fwrite, "fwrite$UNIX2003");		TLI.setAvailableWithName(LibFunc_fwrite, "fwrite$UNIX2003");
▲ Show 20 Lines • Show All 1,531 Lines • Show Last 20 Lines

lib/Transforms/Utils/BuildLibCalls.cpp

Show First 20 Lines • Show All 918 Lines • ▼ Show 20 Lines	Value llvm::emitMemChr(Value Ptr, Value Val, Value Len, IRBuilder<> &B,
CallInst *CI = B.CreateCall(MemChr, {castToCStr(Ptr, B), Val, Len}, MemChrName);		CallInst *CI = B.CreateCall(MemChr, {castToCStr(Ptr, B), Val, Len}, MemChrName);

if (const Function *F = dyn_cast<Function>(MemChr->stripPointerCasts()))		if (const Function *F = dyn_cast<Function>(MemChr->stripPointerCasts()))
CI->setCallingConv(F->getCallingConv());		CI->setCallingConv(F->getCallingConv());

return CI;		return CI;
}		}

Value llvm::emitMemCmp(Value Ptr1, Value Ptr2, Value Len, IRBuilder<> &B,		// Common code for memcmp() and bcmp(), which have the exact same properties,
const DataLayout &DL, const TargetLibraryInfo *TLI) {		// just a slight difference in semantics.
if (!TLI->has(LibFunc_memcmp))		static Value emitMemCmpOrBcmp(llvm::LibFunc libfunc, Value Ptr1, Value *Ptr2,
		Value *Len, IRBuilder<> &B, const DataLayout &DL,
		const TargetLibraryInfo *TLI) {
		if (!TLI->has(libfunc))
return nullptr;		return nullptr;

Module *M = B.GetInsertBlock()->getModule();		Module *M = B.GetInsertBlock()->getModule();
StringRef MemCmpName = TLI->getName(LibFunc_memcmp);		StringRef CmpFnName = TLI->getName(libfunc);
LLVMContext &Context = B.GetInsertBlock()->getContext();		LLVMContext &Context = B.GetInsertBlock()->getContext();
Value *MemCmp = M->getOrInsertFunction(MemCmpName, B.getInt32Ty(),		Value *CmpFn =
B.getInt8PtrTy(), B.getInt8PtrTy(),		M->getOrInsertFunction(CmpFnName, B.getInt32Ty(), B.getInt8PtrTy(),
DL.getIntPtrType(Context));		B.getInt8PtrTy(), DL.getIntPtrType(Context));
inferLibFuncAttributes(M, MemCmpName, *TLI);		inferLibFuncAttributes(M, CmpFnName, *TLI);
CallInst *CI = B.CreateCall(		CallInst *CI = B.CreateCall(
MemCmp, {castToCStr(Ptr1, B), castToCStr(Ptr2, B), Len}, MemCmpName);		CmpFn, {castToCStr(Ptr1, B), castToCStr(Ptr2, B), Len}, CmpFnName);

if (const Function *F = dyn_cast<Function>(MemCmp->stripPointerCasts()))		if (const Function *F = dyn_cast<Function>(CmpFn->stripPointerCasts()))
CI->setCallingConv(F->getCallingConv());		CI->setCallingConv(F->getCallingConv());

return CI;		return CI;
}		}

		Value llvm::emitMemCmp(Value Ptr1, Value Ptr2, Value Len, IRBuilder<> &B,
		const DataLayout &DL, const TargetLibraryInfo *TLI) {
		return emitMemCmpOrBcmp(LibFunc_memcmp, Ptr1, Ptr2, Len, B, DL, TLI);
		}

		Value llvm::emitBCmp(Value Ptr1, Value Ptr2, Value Len, IRBuilder<> &B,
		const DataLayout &DL, const TargetLibraryInfo *TLI) {
		return emitMemCmpOrBcmp(LibFunc_bcmp, Ptr1, Ptr2, Len, B, DL, TLI);
		}

/// Append a suffix to the function name according to the type of 'Op'.		/// Append a suffix to the function name according to the type of 'Op'.
static void appendTypeSuffix(Value *Op, StringRef &Name,		static void appendTypeSuffix(Value *Op, StringRef &Name,
SmallString<20> &NameBuffer) {		SmallString<20> &NameBuffer) {
if (!Op->getType()->isDoubleTy()) {		if (!Op->getType()->isDoubleTy()) {
NameBuffer += Name;		NameBuffer += Name;

if (Op->getType()->isFloatTy())		if (Op->getType()->isFloatTy())
NameBuffer += 'f';		NameBuffer += 'f';
▲ Show 20 Lines • Show All 320 Lines • Show Last 20 Lines

lib/Transforms/Utils/SimplifyLibCalls.cpp

Show First 20 Lines • Show All 819 Lines • ▼ Show 20 Lines	Value LibCallSimplifier::optimizeMemChr(CallInst CI, IRBuilder<> &B) {
size_t I = Str.find(CharC->getSExtValue() & 0xFF);		size_t I = Str.find(CharC->getSExtValue() & 0xFF);
if (I == StringRef::npos) // Didn't find the char. memchr returns null.		if (I == StringRef::npos) // Didn't find the char. memchr returns null.
return Constant::getNullValue(CI->getType());		return Constant::getNullValue(CI->getType());

// memchr(s+n,c,l) -> gep(s+n+i,c)		// memchr(s+n,c,l) -> gep(s+n+i,c)
return B.CreateGEP(B.getInt8Ty(), SrcStr, B.getInt64(I), "memchr");		return B.CreateGEP(B.getInt8Ty(), SrcStr, B.getInt64(I), "memchr");
}		}

Value LibCallSimplifier::optimizeMemCmp(CallInst CI, IRBuilder<> &B) {		static Value optimizeMemCmpConstantSize(CallInst CI, Value LHS, Value RHS,
Value LHS = CI->getArgOperand(0), RHS = CI->getArgOperand(1);		uint64_t Len, IRBuilder<> &B,
		const DataLayout &DL) {
if (LHS == RHS) // memcmp(s,s,x) -> 0
return Constant::getNullValue(CI->getType());

// Make sure we have a constant length.
ConstantInt *LenC = dyn_cast<ConstantInt>(CI->getArgOperand(2));
if (!LenC)
return nullptr;

uint64_t Len = LenC->getZExtValue();
if (Len == 0) // memcmp(s1,s2,0) -> 0		if (Len == 0) // memcmp(s1,s2,0) -> 0
return Constant::getNullValue(CI->getType());		return Constant::getNullValue(CI->getType());

// memcmp(S1,S2,1) -> (unsigned char)LHS - (unsigned char)RHS		// memcmp(S1,S2,1) -> (unsigned char)LHS - (unsigned char)RHS
if (Len == 1) {		if (Len == 1) {
Value *LHSV = B.CreateZExt(B.CreateLoad(castToCStr(LHS, B), "lhsc"),		Value *LHSV = B.CreateZExt(B.CreateLoad(castToCStr(LHS, B), "lhsc"),
CI->getType(), "lhsv");		CI->getType(), "lhsv");
Value *RHSV = B.CreateZExt(B.CreateLoad(castToCStr(RHS, B), "rhsc"),		Value *RHSV = B.CreateZExt(B.CreateLoad(castToCStr(RHS, B), "rhsc"),
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	if (getConstantStringInfo(LHS, LHSStr) &&
uint64_t Ret = 0;		uint64_t Ret = 0;
int Cmp = memcmp(LHSStr.data(), RHSStr.data(), Len);		int Cmp = memcmp(LHSStr.data(), RHSStr.data(), Len);
if (Cmp < 0)		if (Cmp < 0)
Ret = -1;		Ret = -1;
else if (Cmp > 0)		else if (Cmp > 0)
Ret = 1;		Ret = 1;
return ConstantInt::get(CI->getType(), Ret);		return ConstantInt::get(CI->getType(), Ret);
}		}
		return nullptr;
		}

		Value LibCallSimplifier::optimizeMemCmp(CallInst CI, IRBuilder<> &B) {
		Value LHS = CI->getArgOperand(0), RHS = CI->getArgOperand(1);
		Value *Size = CI->getArgOperand(2);

		if (LHS == RHS) // memcmp(s,s,x) -> 0
		return Constant::getNullValue(CI->getType());

		// Handle constant lengths.
		if (ConstantInt *LenC = dyn_cast<ConstantInt>(Size))
		if (Value *Res = optimizeMemCmpConstantSize(CI, LHS, RHS,
		LenC->getZExtValue(), B, DL))
		return Res;

		// memcmp(x, y, Len) == 0 -> bcmp(x, y, Len) == 0
		// `bcmp` can be more efficient than memcmp because it only has to know that
		// there is a difference, not where is is.
		if (isOnlyUsedInZeroEqualityComparison(CI) && TLI->has(LibFunc_bcmp)) {
		xbolva00Unsubmitted Done Reply Inline Actions Move code and create function emitBcmp in BuildLibCalls xbolva00: Move code and create function emitBcmp in BuildLibCalls
		return emitBCmp(LHS, RHS, Size, B, DL, TLI);
		}

return nullptr;		return nullptr;
}		}

Value LibCallSimplifier::optimizeMemCpy(CallInst CI, IRBuilder<> &B) {		Value LibCallSimplifier::optimizeMemCpy(CallInst CI, IRBuilder<> &B) {
// memcpy(x, y, n) -> llvm.memcpy(align 1 x, align 1 y, n)		// memcpy(x, y, n) -> llvm.memcpy(align 1 x, align 1 y, n)
B.CreateMemCpy(CI->getArgOperand(0), 1, CI->getArgOperand(1), 1,		B.CreateMemCpy(CI->getArgOperand(0), 1, CI->getArgOperand(1), 1,
CI->getArgOperand(2));		CI->getArgOperand(2));
▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	Function *FSqrt = Intrinsic::getDeclaration(CI->getModule(), Intrinsic::sqrt,
CI->getType());		CI->getType());
return B.CreateCall(FSqrt, B.CreateFAdd(RealReal, ImagImag), "cabs");		return B.CreateCall(FSqrt, B.CreateFAdd(RealReal, ImagImag), "cabs");
}		}

static Value optimizeTrigReflections(CallInst Call, LibFunc Func,		static Value optimizeTrigReflections(CallInst Call, LibFunc Func,
IRBuilder<> &B) {		IRBuilder<> &B) {
if (!isa<FPMathOperator>(Call))		if (!isa<FPMathOperator>(Call))
return nullptr;		return nullptr;

IRBuilder<>::FastMathFlagGuard Guard(B);		IRBuilder<>::FastMathFlagGuard Guard(B);
B.setFastMathFlags(Call->getFastMathFlags());		B.setFastMathFlags(Call->getFastMathFlags());

// TODO: Can this be shared to also handle LLVM intrinsics?		// TODO: Can this be shared to also handle LLVM intrinsics?
Value *X;		Value *X;
switch (Func) {		switch (Func) {
case LibFunc_sin:		case LibFunc_sin:
case LibFunc_sinf:		case LibFunc_sinf:
case LibFunc_sinl:		case LibFunc_sinl:
case LibFunc_tan:		case LibFunc_tan:
case LibFunc_tanf:		case LibFunc_tanf:
▲ Show 20 Lines • Show All 1,757 Lines • Show Last 20 Lines

test/CodeGen/X86/memcmp.ll

	Show First 20 Lines • Show All 1,338 Lines • ▼ Show 20 Lines
	;			;
	; X64-LABEL: huge_length:			; X64-LABEL: huge_length:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movabsq $9223372036854775807, %rdx # imm = 0x7FFFFFFFFFFFFFFF			; X64-NEXT: movabsq $9223372036854775807, %rdx # imm = 0x7FFFFFFFFFFFFFFF
	; X64-NEXT: jmp memcmp # TAILCALL			; X64-NEXT: jmp memcmp # TAILCALL
	%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 9223372036854775807) nounwind			%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 9223372036854775807) nounwind
	ret i32 %m			ret i32 %m
	}			}

				define i1 @huge_length_eq(i8* %X, i8* %Y) nounwind {
				; X86-LABEL: huge_length_eq:
				; X86: # %bb.0:
				; X86-NEXT: pushl $2147483647 # imm = 0x7FFFFFFF
				; X86-NEXT: pushl $-1
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: calll memcmp
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: testl %eax, %eax
				; X86-NEXT: sete %al
				; X86-NEXT: retl
				;
				; X64-LABEL: huge_length_eq:
				; X64: # %bb.0:
				; X64-NEXT: pushq %rax
				; X64-NEXT: movabsq $9223372036854775807, %rdx # imm = 0x7FFFFFFFFFFFFFFF
				; X64-NEXT: callq memcmp
				; X64-NEXT: testl %eax, %eax
				; X64-NEXT: sete %al
				; X64-NEXT: popq %rcx
				; X64-NEXT: retq

				%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 9223372036854775807) nounwind
				%c = icmp eq i32 %m, 0
				ret i1 %c
				}

				; This checks non-constant sizes.
				define i32 @nonconst_length(i8* %X, i8* %Y, i64 %size) nounwind {
				; X86-LABEL: nonconst_length:
				; X86: # %bb.0:
				; X86-NEXT: jmp memcmp # TAILCALL
				;
				; X64-LABEL: nonconst_length:
				; X64: # %bb.0:
				; X64-NEXT: jmp memcmp # TAILCALL
				%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 %size) nounwind
				ret i32 %m
				}

				define i1 @nonconst_length_eq(i8* %X, i8* %Y, i64 %size) nounwind {
				; X86-LABEL: nonconst_length_eq:
				; X86: # %bb.0:
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: pushl {{[0-9]+}}(%esp)
				; X86-NEXT: calll memcmp
				; X86-NEXT: addl $16, %esp
				; X86-NEXT: testl %eax, %eax
				; X86-NEXT: sete %al
				; X86-NEXT: retl
				;
				; X64-LABEL: nonconst_length_eq:
				; X64: # %bb.0:
				; X64-NEXT: pushq %rax
				; X64-NEXT: callq memcmp
				; X64-NEXT: testl %eax, %eax
				; X64-NEXT: sete %al
				; X64-NEXT: popq %rcx
				; X64-NEXT: retq
				%m = tail call i32 @memcmp(i8* %X, i8* %Y, i64 %size) nounwind
				%c = icmp eq i32 %m, 0
				ret i1 %c
				}

test/Transforms/InferFunctionAttrs/annotate.ll

	; RUN: opt < %s -mtriple=x86_64-- -inferattrs -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-- -inferattrs -S \| FileCheck -check-prefix=CHECK-UNKNOWN %s
	; RUN: opt < %s -mtriple=x86_64-- -passes=inferattrs -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-- -passes=inferattrs -S \| FileCheck -check-prefix=CHECK-UNKNOWN %s
	; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -inferattrs -S \| FileCheck -check-prefix=CHECK -check-prefix=CHECK-DARWIN %s			; RUN: opt < %s -mtriple=x86_64-apple-macosx10.8.0 -inferattrs -S \| FileCheck -check-prefix=CHECK -check-prefix=CHECK-DARWIN %s
	; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -inferattrs -S \| FileCheck -check-prefix=CHECK -check-prefix=CHECK-LINUX %s			; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -inferattrs -S \| FileCheck -check-prefix=CHECK -check-prefix=CHECK-LINUX %s
	; RUN: opt < %s -mtriple=nvptx -inferattrs -S \| FileCheck -check-prefix=CHECK-NVPTX %s			; RUN: opt < %s -mtriple=nvptx -inferattrs -S \| FileCheck -check-prefix=CHECK-NVPTX %s

	; operator new routines			; operator new routines
	declare i8* @_Znwj(i64)			declare i8* @_Znwj(i64)
	; CHECK: declare noalias nonnull i8* @_Znwj(i64)			; CHECK: declare noalias nonnull i8* @_Znwj(i64)
	declare i8* @_Znwm(i64)			declare i8* @_Znwm(i64)
	▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines
	declare i32 @atoi(i8*)			declare i32 @atoi(i8*)

	; CHECK: declare i64 @atol(i8* nocapture) [[G1]]			; CHECK: declare i64 @atol(i8* nocapture) [[G1]]
	declare i64 @atol(i8*)			declare i64 @atol(i8*)

	; CHECK: declare i64 @atoll(i8* nocapture) [[G1]]			; CHECK: declare i64 @atoll(i8* nocapture) [[G1]]
	declare i64 @atoll(i8*)			declare i64 @atoll(i8*)

	; CHECK: declare i32 @bcmp(i8* nocapture, i8* nocapture, i64) [[G1]]			; CHECK-DARWIN: declare i32 @bcmp(i8* nocapture, i8* nocapture, i64) [[G1]]
				; CHECK-LINUX: declare i32 @bcmp(i8* nocapture, i8* nocapture, i64) [[G1]]
				; CHECK-UNKNOWN-NOT: declare i32 @bcmp(i8* nocapture, i8* nocapture, i64) [[G1]]
				; CHECK-NVPTX-NOT: declare i32 @bcmp(i8* nocapture, i8* nocapture, i64) [[G1]]
	declare i32 @bcmp(i8, i8, i64)			declare i32 @bcmp(i8, i8, i64)

	; CHECK: declare void @bcopy(i8* nocapture readonly, i8* nocapture, i64) [[G0]]			; CHECK: declare void @bcopy(i8* nocapture readonly, i8* nocapture, i64) [[G0]]
	declare void @bcopy(i8, i8, i64)			declare void @bcopy(i8, i8, i64)

	; CHECK: declare void @bzero(i8* nocapture, i64) [[G0]]			; CHECK: declare void @bzero(i8* nocapture, i64) [[G0]]
	declare void @bzero(i8*, i64)			declare void @bzero(i8*, i64)

	▲ Show 20 Lines • Show All 760 Lines • Show Last 20 Lines

test/Transforms/InstCombine/memcmp-1.ll

; Test that the memcmp library call simplifier works correctly.		; Test that the memcmp library call simplifier works correctly.
;		;
; RUN: opt < %s -instcombine -S \| FileCheck %s		; RUN: opt < %s -instcombine -S \| FileCheck --check-prefix=CHECK --check-prefix=NOBCMP %s
		; RUN: opt < %s -instcombine -mtriple=x86_64-unknown-linux-gnu -S \| FileCheck --check-prefix=CHECK --check-prefix=BCMP %s

target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32:64"		target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32:64"

@foo = constant [4 x i8] c"foo\00"		@foo = constant [4 x i8] c"foo\00"
@hel = constant [4 x i8] c"hel\00"		@hel = constant [4 x i8] c"hel\00"
@hello_u = constant [8 x i8] c"hello_u\00"		@hello_u = constant [8 x i8] c"hello_u\00"

declare i32 @memcmp(i8, i8, i32)		declare i32 @memcmp(i8, i8, i32)
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	;
store i16 %x, i16* %x.addr, align 2		store i16 %x, i16* %x.addr, align 2
store i16 %y, i16* %y.addr, align 2		store i16 %y, i16* %y.addr, align 2
%xptr = bitcast i16* %x.addr to i8*		%xptr = bitcast i16* %x.addr to i8*
%yptr = bitcast i16* %y.addr to i8*		%yptr = bitcast i16* %y.addr to i8*
%call = call i32 @memcmp(i8* %xptr, i8* %yptr, i32 2)		%call = call i32 @memcmp(i8* %xptr, i8* %yptr, i32 2)
%cmp = icmp eq i32 %call, 0		%cmp = icmp eq i32 %call, 0
ret i1 %cmp		ret i1 %cmp
}		}

		; Check memcmp(mem1, mem2, size)==0 -> bcmp(mem1, mem2, size)==0

		define i1 @test_simplify10(i8* %mem1, i8* %mem2, i32 %size) {
		; NOBCMP-LABEL: @test_simplify10(
		; NOBCMP-NEXT: [[CALL:%.]] = call i32 @memcmp(i8 %mem1, i8* %mem2, i32 %size)
		; NOBCMP-NEXT: [[CMP:%.*]] = icmp eq i32 [[CALL]], 0
		; NOBCMP-NEXT: ret i1 [[CMP]]
		;
		; BCMP-LABEL: @test_simplify10(
		; BCMP-NEXT: [[CALL:%.]] = call i32 @bcmp(i8 %mem1, i8* %mem2, i32 %size)
		; BCMP-NEXT: [[CMP:%.*]] = icmp eq i32 [[CALL]], 0
		; BCMP-NEXT: ret i1 [[CMP]]
		;
		%call = call i32 @memcmp(i8* %mem1, i8* %mem2, i32 %size)
		%cmp = icmp eq i32 %call, 0
		ret i1 %cmp
		}

tools/llvm-exegesis/lib/Clustering.cpp

//===-- Clustering.cpp ------------------------------------------- C++ --===//		//===-- Clustering.cpp ------------------------------------------- C++ --===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "Clustering.h"		#include "Clustering.h"
		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include <string>		#include <string>

namespace llvm {		namespace llvm {
namespace exegesis {		namespace exegesis {

// The clustering problem has the following characteristics:		// The clustering problem has the following characteristics:
// (A) - Low dimension (dimensions are typically proc resource units,		// (A) - Low dimension (dimensions are typically proc resource units,
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	llvm::Error InstructionBenchmarkClustering::validateAndSetup() {
}		}
if (LastMeasurement) {		if (LastMeasurement) {
NumDimensions_ = LastMeasurement->size();		NumDimensions_ = LastMeasurement->size();
}		}
return llvm::Error::success();		return llvm::Error::success();
}		}

void InstructionBenchmarkClustering::dbScan(const size_t MinPts) {		void InstructionBenchmarkClustering::dbScan(const size_t MinPts) {
const size_t NumPoints = Points_.size();		std::vector<size_t> Neighbors; // Persistent buffer to avoid allocs.
		for (size_t P = 0, NumPoints = Points_.size(); P < NumPoints; ++P) {
// Persistent buffers to avoid allocs.
std::vector<size_t> Neighbors;
std::vector<size_t> ToProcess(NumPoints);
std::vector<char> Processed(NumPoints);

for (size_t P = 0; P < NumPoints; ++P) {
if (!ClusterIdForPoint_[P].isUndef())		if (!ClusterIdForPoint_[P].isUndef())
continue; // Previously processed in inner loop.		continue; // Previously processed in inner loop.
rangeQuery(P, Neighbors);		rangeQuery(P, Neighbors);
if (Neighbors.size() + 1 < MinPts) { // Density check.		if (Neighbors.size() + 1 < MinPts) { // Density check.
// The region around P is not dense enough to create a new cluster, mark		// The region around P is not dense enough to create a new cluster, mark
// as noise for now.		// as noise for now.
ClusterIdForPoint_[P] = ClusterId::noise();		ClusterIdForPoint_[P] = ClusterId::noise();
continue;		continue;
}		}

// Create a new cluster, add P.		// Create a new cluster, add P.
Clusters_.emplace_back(ClusterId::makeValid(Clusters_.size()));		Clusters_.emplace_back(ClusterId::makeValid(Clusters_.size()));
Cluster &CurrentCluster = Clusters_.back();		Cluster &CurrentCluster = Clusters_.back();
ClusterIdForPoint_[P] = CurrentCluster.Id; /* Label initial point */		ClusterIdForPoint_[P] = CurrentCluster.Id; /* Label initial point */
CurrentCluster.PointIndices.push_back(P);		CurrentCluster.PointIndices.push_back(P);
Processed[P] = 1;

// Enqueue P's neighbors.		// Process P's neighbors.
size_t Tail = 0;		llvm::SetVector<size_t, std::deque<size_t>> ToProcess;
auto EnqueueUnprocessed = [&](const std::vector<size_t> &Neighbors) {		ToProcess.insert(Neighbors.begin(), Neighbors.end());
for (size_t Q : Neighbors)		while (!ToProcess.empty()) {
if (!Processed[Q]) {		// Retrieve a point from the set.
ToProcess[Tail++] = Q;		const size_t Q = *ToProcess.begin();
Processed[Q] = 1;		ToProcess.erase(ToProcess.begin());
}
};		if (ClusterIdForPoint_[Q].isNoise()) {
EnqueueUnprocessed(Neighbors);		// Change noise point to border point.
		ClusterIdForPoint_[Q] = CurrentCluster.Id;
for (size_t Head = 0; Head < Tail; ++Head) {		CurrentCluster.PointIndices.push_back(Q);
// Retrieve a point from the queue and add it to the current cluster.
P = ToProcess[Head];
ClusterId OldCID = ClusterIdForPoint_[P];
ClusterIdForPoint_[P] = CurrentCluster.Id;
CurrentCluster.PointIndices.push_back(P);
if (OldCID.isNoise())
continue;		continue;
assert(OldCID.isUndef());

// And extend to the neighbors of P if the region is dense enough.
rangeQuery(P, Neighbors);
if (Neighbors.size() + 1 >= MinPts)
EnqueueUnprocessed(Neighbors);
}		}
		if (!ClusterIdForPoint_[Q].isUndef()) {
		continue; // Previously processed.
}		}
		// Add Q to the current custer.
		ClusterIdForPoint_[Q] = CurrentCluster.Id;
		CurrentCluster.PointIndices.push_back(Q);
		// And extend to the neighbors of Q if the region is dense enough.
		rangeQuery(Q, Neighbors);
		if (Neighbors.size() + 1 >= MinPts) {
		ToProcess.insert(Neighbors.begin(), Neighbors.end());
		}
		}
		}
		// assert(Neighbors.capacity() == (Points_.size() - 1));
		// ^ True, but it is not quaranteed to be true in all the cases.

// Add noisy points to noise cluster.		// Add noisy points to noise cluster.
for (size_t P = 0; P < NumPoints; ++P)		for (size_t P = 0, NumPoints = Points_.size(); P < NumPoints; ++P) {
if (ClusterIdForPoint_[P].isNoise())		if (ClusterIdForPoint_[P].isNoise()) {
NoiseCluster_.PointIndices.push_back(P);		NoiseCluster_.PointIndices.push_back(P);
}		}
		}
		}

llvm::Expected<InstructionBenchmarkClustering>		llvm::Expected<InstructionBenchmarkClustering>
InstructionBenchmarkClustering::create(		InstructionBenchmarkClustering::create(
const std::vector<InstructionBenchmark> &Points, const size_t MinPts,		const std::vector<InstructionBenchmark> &Points, const size_t MinPts,
const double Epsilon) {		const double Epsilon) {
InstructionBenchmarkClustering Clustering(Points, Epsilon * Epsilon);		InstructionBenchmarkClustering Clustering(Points, Epsilon * Epsilon);
if (auto Error = Clustering.validateAndSetup()) {		if (auto Error = Clustering.validateAndSetup()) {
return std::move(Error);		return std::move(Error);
Show All 11 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG][RFC] Allow the user to specify a memeq function (v5).ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 185288

include/llvm/Transforms/Utils/BuildLibCalls.h

lib/Analysis/TargetLibraryInfo.cpp

lib/Transforms/Utils/BuildLibCalls.cpp

lib/Transforms/Utils/SimplifyLibCalls.cpp

test/CodeGen/X86/memcmp.ll

test/Transforms/InferFunctionAttrs/annotate.ll

test/Transforms/InstCombine/memcmp-1.ll

tools/llvm-exegesis/lib/Clustering.cpp

[SelectionDAG][RFC] Allow the user to specify a memeq function (v5).
ClosedPublic