This is an archive of the discontinued LLVM Phabricator instance.

// clang/lib/CodeGen/CodeGenModule:shouldAssumeDSOLocal
  const auto &CGOpts = CGM.getCodeGenOpts();
  llvm::Reloc::Model RM = CGOpts.RelocationModel;
  const auto &LOpts = CGM.getLangOpts();
  if (RM != llvm::Reloc::Static && !LOpts.PIE)
    return false;

-fpic/-fPIC does not set dso_local so this change does not affect that compile model.

In D73230#1850841, @MaskRay wrote:

-fpic/-fPIC does not set dso_local so this change does not affect that compile model.

Yep, that's how it's supposed to work. :)

lgtm

This revision is now accepted and ready to land.Jan 30 2020, 5:24 PM

In D73230#1850895, @rnk wrote:

In D73230#1850841, @MaskRay wrote:

-fpic/-fPIC does not set dso_local so this change does not affect that compile model.

Yep, that's how it's supposed to work. :)

lgtm

Thanks!

When -fsemantic-interposition is ready, I want to check if we can do some aggressive thing: infer dso_local for -fPIC, i.e. require -fPIC users to specify -fsemantic-interposition to get the interposition behavior.

As the summary of D73228 says, the existing -fno-semantic-interposition behaviors in various IPO optimization and the previous assembly behavior may have given us enough license to do this.

Fix fold-add-pcrel.ll

Closed by commit rG5b22bcc2b70d: [X86][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local (authored by MaskRay). · Explain WhyJan 30 2020, 5:56 PM

This revision was automatically updated to reflect the committed changes.

Unit tests: fail. 62352 tests passed, 1 failed and 839 were skipped.

failed: libc++.std/containers/sequences/array/array_creation/to_array.fail.cpp

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Harbormaster failed remote builds in B45407: Diff 241617!Jan 30 2020, 6:05 PM

nickdesaulniers added a subscriber: nickdesaulniers.Feb 12 2020, 3:15 PM

@MaskRay - this change causes a behaviour difference for --wrap.

Here is the --wrap behaviour before this change:

ben@ben-VirtualBox:~/tests/wrap$ more main.c
void __wrap_foo () {
	puts ("__wrap_foo");
	__real_foo();
}

void foo () { puts("foo()"); }

int main() {
	__real_foo();
	puts("---");
	__wrap_foo();
	puts("---");
	foo();
	return 0;
}
ben@ben-VirtualBox:~/tests/wrap$ gcc main.c -Wl,--wrap=foo -ffunction-sections -fuse-ld=lld -o lld.elf -Wno-implicit-function-declaration
ben@ben-VirtualBox:~/tests/wrap$ ./lld.elf 
foo()
---
__wrap_foo
foo()
---
__wrap_foo
foo()

… and here is the behaviour after this change:

ben@ben-VirtualBox:~/tests/wrap$ ./lld.elf 
foo()
---
__wrap_foo
foo()
---
foo()

There is no behaviour change for -flto builds so the behaviour for --wrap is now effectively different for LTO vs normal builds.

Passing-by remark: This change passed in our internal huge code base. isn't a great expanded description for the change..

In D73230#2206232, @bd1976llvm wrote:
@MaskRay - this change causes a behaviour difference for --wrap.

Here is the --wrap behaviour before this change:
ben@ben-VirtualBox:~/tests/wrap$ more main.c
void __wrap_foo () {
	puts ("__wrap_foo");
	__real_foo();
}

void foo () { puts("foo()"); }

int main() {
	__real_foo();
	puts("---");
	__wrap_foo();
	puts("---");
	foo();
	return 0;
}
ben@ben-VirtualBox:~/tests/wrap$ gcc main.c -Wl,--wrap=foo -ffunction-sections -fuse-ld=lld -o lld.elf -Wno-implicit-function-declaration
ben@ben-VirtualBox:~/tests/wrap$ ./lld.elf 
foo()
---
__wrap_foo
foo()
---
__wrap_foo
foo()
… and here is the behaviour after this change:
ben@ben-VirtualBox:~/tests/wrap$ ./lld.elf 
foo()
---
__wrap_foo
foo()
---
foo()
There is no behaviour change for -flto builds so the behaviour for --wrap is now effectively different for LTO vs normal builds.

I think you missed a point in the description of --wrap:

You may wish to provide a "__real_malloc" function as well, so that links without the --wrap option will succeed.  If you do this, you
should not put the definition of "__real_malloc" in the same file as "__wrap_malloc"; if you do, the assembler may resolve the call
before the linker has a chance to wrap it to "malloc".

Providing foo definition in the translation unit where they are referenced is not reliable when you are using --wrap.
Actually, this is where GNU ld and LLD differ. See https://sourceware.org/bugzilla/show_bug.cgi?id=26358 and the history of lld/test/ELF/wrap-shlib-undefined.s

If you want to get guaranteed semantics, don't define foo when it is referenced. You may also try gcc and gcc -fPIC -fno-semantic-interposition, the behavior is similar to latest clang.

In D73230#2207023, @MaskRay wrote:
In D73230#2206232, @bd1976llvm wrote:
@MaskRay - this change causes a behaviour difference for --wrap.

Here is the --wrap behaviour before this change:
ben@ben-VirtualBox:~/tests/wrap$ more main.c
void __wrap_foo () {
	puts ("__wrap_foo");
	__real_foo();
}

void foo () { puts("foo()"); }

int main() {
	__real_foo();
	puts("---");
	__wrap_foo();
	puts("---");
	foo();
	return 0;
}
ben@ben-VirtualBox:~/tests/wrap$ gcc main.c -Wl,--wrap=foo -ffunction-sections -fuse-ld=lld -o lld.elf -Wno-implicit-function-declaration
ben@ben-VirtualBox:~/tests/wrap$ ./lld.elf 
foo()
---
__wrap_foo
foo()
---
__wrap_foo
foo()
… and here is the behaviour after this change:
ben@ben-VirtualBox:~/tests/wrap$ ./lld.elf 
foo()
---
__wrap_foo
foo()
---
foo()
There is no behaviour change for -flto builds so the behaviour for --wrap is now effectively different for LTO vs normal builds.
I think you missed a point in the description of --wrap:
You may wish to provide a "__real_malloc" function as well, so that links without the --wrap option will succeed.  If you do this, you
should not put the definition of "__real_malloc" in the same file as "__wrap_malloc"; if you do, the assembler may resolve the call
before the linker has a chance to wrap it to "malloc".
Providing foo definition in the translation unit where they are referenced is not reliable when you are using --wrap.
Actually, this is where GNU ld and LLD differ. See https://sourceware.org/bugzilla/show_bug.cgi?id=26358 and the history of lld/test/ELF/wrap-shlib-undefined.s

If you want to get guaranteed semantics, don't define foo when it is referenced. You may also try gcc and gcc -fPIC -fno-semantic-interposition, the behavior is similar to latest clang.

Thanks for the summary. I am not particularly concerned about which behaviour we have w.r.t. wrapping intra-translation-unit references (although I have seen some evidence that lld's behaviour is useful e.g. https://stackoverflow.com/questions/13961774/gnu-gcc-ld-wrapping-a-call-to-symbol-with-caller-and-callee-defined-in-the-sam). However, you stated in https://sourceware.org/bugzilla/show_bug.cgi?id=26358 that for lld -r, lto, and normal links have the same behaviour - that is not true after this change. Furthermore, with the current clang it is not possible to go back to the old behaviour using -fsemantic-interposition for hidden symbols. IIUC I think that hidden symbols are probably the majority of opensource symbols now as the GNU toolchain encourages the use of -fvisiblity=hidden.

In D73230#2207159, @bd1976llvm wrote:
In D73230#2207023, @MaskRay wrote:
In D73230#2206232, @bd1976llvm wrote:
@MaskRay - this change causes a behaviour difference for --wrap.

Here is the --wrap behaviour before this change:
ben@ben-VirtualBox:~/tests/wrap$ more main.c
void __wrap_foo () {
	puts ("__wrap_foo");
	__real_foo();
}

void foo () { puts("foo()"); }

int main() {
	__real_foo();
	puts("---");
	__wrap_foo();
	puts("---");
	foo();
	return 0;
}
ben@ben-VirtualBox:~/tests/wrap$ gcc main.c -Wl,--wrap=foo -ffunction-sections -fuse-ld=lld -o lld.elf -Wno-implicit-function-declaration
ben@ben-VirtualBox:~/tests/wrap$ ./lld.elf 
foo()
---
__wrap_foo
foo()
---
__wrap_foo
foo()
… and here is the behaviour after this change:
ben@ben-VirtualBox:~/tests/wrap$ ./lld.elf 
foo()
---
__wrap_foo
foo()
---
foo()
There is no behaviour change for -flto builds so the behaviour for --wrap is now effectively different for LTO vs normal builds.
I think you missed a point in the description of --wrap:
You may wish to provide a "__real_malloc" function as well, so that links without the --wrap option will succeed.  If you do this, you
should not put the definition of "__real_malloc" in the same file as "__wrap_malloc"; if you do, the assembler may resolve the call
before the linker has a chance to wrap it to "malloc".
Providing foo definition in the translation unit where they are referenced is not reliable when you are using --wrap.
Actually, this is where GNU ld and LLD differ. See https://sourceware.org/bugzilla/show_bug.cgi?id=26358 and the history of lld/test/ELF/wrap-shlib-undefined.s

If you want to get guaranteed semantics, don't define foo when it is referenced. You may also try gcc and gcc -fPIC -fno-semantic-interposition, the behavior is similar to latest clang.
Thanks for the summary. I am not particularly concerned about which behaviour we have w.r.t. wrapping intra-translation-unit references (although I have seen some evidence that lld's behaviour is useful e.g. https://stackoverflow.com/questions/13961774/gnu-gcc-ld-wrapping-a-call-to-symbol-with-caller-and-callee-defined-in-the-sam). However, you stated in https://sourceware.org/bugzilla/show_bug.cgi?id=26358 that for lld -r, lto, and normal links have the same behaviour - that is not true after this change. Furthermore, with the current clang it is not possible to go back to the old behaviour using -fsemantic-interposition for hidden symbols. IIUC I think that hidden symbols are probably the majority of opensource symbols now as the GNU toolchain encourages the use of -fvisiblity=hidden.

Let me summarize the cases:

{clang,gcc} -fuse-ld=lld main.c -Wl,--wrap=foo => wrapped (in LLD, --wrap is done after (global) symbol resolution. Definitions are wrapped as well)
{clang,gcc} -fuse-ld=bfd main.c -Wl,--wrap=foo => not wrapped (in GNU ld, --wrap is per object file. --wrap is not effective when the symbol is defined)
{clang,gcc} -fuse-ld=lld main.c -fPIC -fno-semantic-interposition -Wl,--wrap=foo => not wrapped (references go through .Lfoo$local which cannot be wrapped)

I think your make an integration between this commit and 872c5fb1432493c0a09b6f210765c0d94ce9b5d0, so for -fno-PIC or -fPIE, you observe the -fPIC -fno-semantic-interposition behavior as well. If you cherry pick 872c5fb1432493c0a09b6f210765c0d94ce9b5d0 and don't use -fno-semantic-interposition, and use LLD, you should get a wrapping behavior. (Clang traditionally has some -fno-semantic-interposition behaviors, so in the future we might be able to make -fno-semantic-interposition default for -fPIC.)

I indeed prefer the LLD behavior, so I filed https://sourceware.org/bugzilla/show_bug.cgi?id=26358 yesterday, but I cannot say the wrapping behavior is promised. If you want better portability, make foo weak (and be aware of side effects with the change).

In D73230#2207258, @MaskRay wrote:
In D73230#2207159, @bd1976llvm wrote:
In D73230#2207023, @MaskRay wrote:
In D73230#2206232, @bd1976llvm wrote:
@MaskRay - this change causes a behaviour difference for --wrap.

Here is the --wrap behaviour before this change:
ben@ben-VirtualBox:~/tests/wrap$ more main.c
void __wrap_foo () {
	puts ("__wrap_foo");
	__real_foo();
}

void foo () { puts("foo()"); }

int main() {
	__real_foo();
	puts("---");
	__wrap_foo();
	puts("---");
	foo();
	return 0;
}
ben@ben-VirtualBox:~/tests/wrap$ gcc main.c -Wl,--wrap=foo -ffunction-sections -fuse-ld=lld -o lld.elf -Wno-implicit-function-declaration
ben@ben-VirtualBox:~/tests/wrap$ ./lld.elf 
foo()
---
__wrap_foo
foo()
---
__wrap_foo
foo()
… and here is the behaviour after this change:
ben@ben-VirtualBox:~/tests/wrap$ ./lld.elf 
foo()
---
__wrap_foo
foo()
---
foo()
There is no behaviour change for -flto builds so the behaviour for --wrap is now effectively different for LTO vs normal builds.
I think you missed a point in the description of --wrap:
You may wish to provide a "__real_malloc" function as well, so that links without the --wrap option will succeed.  If you do this, you
should not put the definition of "__real_malloc" in the same file as "__wrap_malloc"; if you do, the assembler may resolve the call
before the linker has a chance to wrap it to "malloc".
Providing foo definition in the translation unit where they are referenced is not reliable when you are using --wrap.
Actually, this is where GNU ld and LLD differ. See https://sourceware.org/bugzilla/show_bug.cgi?id=26358 and the history of lld/test/ELF/wrap-shlib-undefined.s

If you want to get guaranteed semantics, don't define foo when it is referenced. You may also try gcc and gcc -fPIC -fno-semantic-interposition, the behavior is similar to latest clang.
Thanks for the summary. I am not particularly concerned about which behaviour we have w.r.t. wrapping intra-translation-unit references (although I have seen some evidence that lld's behaviour is useful e.g. https://stackoverflow.com/questions/13961774/gnu-gcc-ld-wrapping-a-call-to-symbol-with-caller-and-callee-defined-in-the-sam). However, you stated in https://sourceware.org/bugzilla/show_bug.cgi?id=26358 that for lld -r, lto, and normal links have the same behaviour - that is not true after this change. Furthermore, with the current clang it is not possible to go back to the old behaviour using -fsemantic-interposition for hidden symbols. IIUC I think that hidden symbols are probably the majority of opensource symbols now as the GNU toolchain encourages the use of -fvisiblity=hidden.
Let me summarize the cases:

{clang,gcc} -fuse-ld=lld main.c -Wl,--wrap=foo => wrapped (in LLD, --wrap is done after (global) symbol resolution. Definitions are wrapped as well)

{clang,gcc} -fuse-ld=bfd main.c -Wl,--wrap=foo => not wrapped (in GNU ld, --wrap is per object file. --wrap is not effective when the symbol is defined)

{clang,gcc} -fuse-ld=lld main.c -fPIC -fno-semantic-interposition -Wl,--wrap=foo => not wrapped (references go through .Lfoo$local which cannot be wrapped)

I think your make an integration between this commit and 872c5fb1432493c0a09b6f210765c0d94ce9b5d0, so for -fno-PIC or -fPIE, you observe the -fPIC -fno-semantic-interposition behavior as well. If you cherry pick 872c5fb1432493c0a09b6f210765c0d94ce9b5d0 and don't use -fno-semantic-interposition, and use LLD, you should get a wrapping behavior. (Clang traditionally has some -fno-semantic-interposition behaviors, so in the future we might be able to make -fno-semantic-interposition default for -fPIC.)

I indeed prefer the LLD behavior, so I filed https://sourceware.org/bugzilla/show_bug.cgi?id=26358 yesterday, but I cannot say the wrapping behavior is promised. If you want better portability, make foo weak (and be aware of side effects with the change).

@MaskRay - Thanks for taking the time to look into this :)

Here are my results from an e912fffd3a8c6c9f6e09d2eac4c1ee3a32800a22 clang using my previous example. I have included a bit more context in the first relocation dump so you can see which relocation.. in the other dumps I strip out more.

With -fpic:

$ /c/u/br2/bin/clang main.c -fno-semantic-interposition -c -fpic -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=default -Wno-implicit-function-declaration
$ /c/u/br2/bin/llvm-readelf -r test.o                                           
...
Relocation section '.rela.text.main' at offset 0x330 contains 7 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
...
0000000000000041  0000000600000004 R_X86_64_PLT32         0000000000000000 .text.foo - 4
...
Relocation section '.rela.eh_frame' at offset 0x3d8 contains 3 entries:
...

$ /c/u/br2/bin/clang main.c -fsemantic-interposition -c -fpic -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=default -Wno-implicit-function-declaration
$ /c/u/br2/bin/llvm-readelf -r test.o                                           
...
0000000000000041  0000000a00000004 R_X86_64_PLT32         0000000000000000 foo - 4
...

$ /c/u/br2/bin/clang main.c -fno-semantic-interposition -c -fpic -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=hidden -Wno-implicit-function-declaration
$ /c/u/br2/bin/llvm-readelf -r test.o                                           
...
0000000000000041  0000000600000004 R_X86_64_PLT32         0000000000000000 .text.foo - 4
...

$ /c/u/br2/bin/clang main.c -fsemantic-interposition -c -fpic -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=hidden -Wno-implicit-function-declaration
$ /c/u/br2/bin/llvm-readelf -r test.o                                           
...
0000000000000041  0000000600000004 R_X86_64_PLT32         0000000000000000 .text.foo - 4
…

With -fpie:

$ /c/u/br2/bin/clang main.c -fno-semantic-interposition -c -fpie -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=default -Wno-implicit-function-declaration
$ /c/u/br2/bin/llvm-readelf -r test.o
...
0000000000000041  0000000a00000004 R_X86_64_PLT32         0000000000000000 foo - 4
...

$ /c/u/br2/bin/clang main.c -fsemantic-interposition -c -fpie -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=default -Wno-implicit-function-declaration
$ /c/u/br2/bin/llvm-readelf -r test.o                                           
...
0000000000000041  0000000a00000004 R_X86_64_PLT32         0000000000000000 foo - 4
...

$ /c/u/br2/bin/clang main.c -fno-semantic-interposition -c -fpie -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=hidden -Wno-implicit-function-declaration
$ /c/u/br2/bin/llvm-readelf -r test.o                                           
...
0000000000000041  0000000a00000004 R_X86_64_PLT32         0000000000000000 foo - 4
...

$ /c/u/br2/bin/clang main.c -fsemantic-interposition -c -fpie -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=hidden -Wno-implicit-function-declaration
$ /c/u/br2/bin/llvm-readelf -r test.o                                           
...
0000000000000041  0000000a00000004 R_X86_64_PLT32         0000000000000000 foo - 4
...

So, clang's behavior has changed so that --wrap no longer wraps symbol definitions for default symbols + -fpic + -fno-semantic-interposition (-fno-semantic-interposition is the default); however, you can restore the old behavior via -fsemantic-interposition. For hidden symbols + -fpic --wrap no longer wraps symbol definitions and there is no way to restore the old definition wrapping behavior.

In D73230#2207477, @bd1976llvm wrote:
In D73230#2207258, @MaskRay wrote:
In D73230#2207159, @bd1976llvm wrote:
In D73230#2207023, @MaskRay wrote:
In D73230#2206232, @bd1976llvm wrote:
@MaskRay - this change causes a behaviour difference for --wrap.

Here is the --wrap behaviour before this change:
ben@ben-VirtualBox:~/tests/wrap$ more main.c
void __wrap_foo () {
	puts ("__wrap_foo");
	__real_foo();
}

void foo () { puts("foo()"); }

int main() {
	__real_foo();
	puts("---");
	__wrap_foo();
	puts("---");
	foo();
	return 0;
}
ben@ben-VirtualBox:~/tests/wrap$ gcc main.c -Wl,--wrap=foo -ffunction-sections -fuse-ld=lld -o lld.elf -Wno-implicit-function-declaration
ben@ben-VirtualBox:~/tests/wrap$ ./lld.elf 
foo()
---
__wrap_foo
foo()
---
__wrap_foo
foo()
… and here is the behaviour after this change:
ben@ben-VirtualBox:~/tests/wrap$ ./lld.elf 
foo()
---
__wrap_foo
foo()
---
foo()
There is no behaviour change for -flto builds so the behaviour for --wrap is now effectively different for LTO vs normal builds.
I think you missed a point in the description of --wrap:
You may wish to provide a "__real_malloc" function as well, so that links without the --wrap option will succeed.  If you do this, you
should not put the definition of "__real_malloc" in the same file as "__wrap_malloc"; if you do, the assembler may resolve the call
before the linker has a chance to wrap it to "malloc".
Providing foo definition in the translation unit where they are referenced is not reliable when you are using --wrap.
Actually, this is where GNU ld and LLD differ. See https://sourceware.org/bugzilla/show_bug.cgi?id=26358 and the history of lld/test/ELF/wrap-shlib-undefined.s

If you want to get guaranteed semantics, don't define foo when it is referenced. You may also try gcc and gcc -fPIC -fno-semantic-interposition, the behavior is similar to latest clang.
Thanks for the summary. I am not particularly concerned about which behaviour we have w.r.t. wrapping intra-translation-unit references (although I have seen some evidence that lld's behaviour is useful e.g. https://stackoverflow.com/questions/13961774/gnu-gcc-ld-wrapping-a-call-to-symbol-with-caller-and-callee-defined-in-the-sam). However, you stated in https://sourceware.org/bugzilla/show_bug.cgi?id=26358 that for lld -r, lto, and normal links have the same behaviour - that is not true after this change. Furthermore, with the current clang it is not possible to go back to the old behaviour using -fsemantic-interposition for hidden symbols. IIUC I think that hidden symbols are probably the majority of opensource symbols now as the GNU toolchain encourages the use of -fvisiblity=hidden.
Let me summarize the cases:

{clang,gcc} -fuse-ld=lld main.c -Wl,--wrap=foo => wrapped (in LLD, --wrap is done after (global) symbol resolution. Definitions are wrapped as well)

{clang,gcc} -fuse-ld=bfd main.c -Wl,--wrap=foo => not wrapped (in GNU ld, --wrap is per object file. --wrap is not effective when the symbol is defined)

{clang,gcc} -fuse-ld=lld main.c -fPIC -fno-semantic-interposition -Wl,--wrap=foo => not wrapped (references go through .Lfoo$local which cannot be wrapped)

I think your make an integration between this commit and 872c5fb1432493c0a09b6f210765c0d94ce9b5d0, so for -fno-PIC or -fPIE, you observe the -fPIC -fno-semantic-interposition behavior as well. If you cherry pick 872c5fb1432493c0a09b6f210765c0d94ce9b5d0 and don't use -fno-semantic-interposition, and use LLD, you should get a wrapping behavior. (Clang traditionally has some -fno-semantic-interposition behaviors, so in the future we might be able to make -fno-semantic-interposition default for -fPIC.)

I indeed prefer the LLD behavior, so I filed https://sourceware.org/bugzilla/show_bug.cgi?id=26358 yesterday, but I cannot say the wrapping behavior is promised. If you want better portability, make foo weak (and be aware of side effects with the change).
@MaskRay - Thanks for taking the time to look into this :)

Here are my results from an e912fffd3a8c6c9f6e09d2eac4c1ee3a32800a22 clang using my previous example. I have included a bit more context in the first relocation dump so you can see which relocation.. in the other dumps I strip out more.

With -fpic:
$ /c/u/br2/bin/clang main.c -fno-semantic-interposition -c -fpic -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=default -Wno-implicit-function-declaration
$ /c/u/br2/bin/llvm-readelf -r test.o                                           
...
Relocation section '.rela.text.main' at offset 0x330 contains 7 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name + Addend
...
0000000000000041  0000000600000004 R_X86_64_PLT32         0000000000000000 .text.foo - 4
...
Relocation section '.rela.eh_frame' at offset 0x3d8 contains 3 entries:
...

$ /c/u/br2/bin/clang main.c -fsemantic-interposition -c -fpic -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=default -Wno-implicit-function-declaration
$ /c/u/br2/bin/llvm-readelf -r test.o                                           
...
0000000000000041  0000000a00000004 R_X86_64_PLT32         0000000000000000 foo - 4
...

$ /c/u/br2/bin/clang main.c -fno-semantic-interposition -c -fpic -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=hidden -Wno-implicit-function-declaration
$ /c/u/br2/bin/llvm-readelf -r test.o                                           
...
0000000000000041  0000000600000004 R_X86_64_PLT32         0000000000000000 .text.foo - 4
...

$ /c/u/br2/bin/clang main.c -fsemantic-interposition -c -fpic -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=hidden -Wno-implicit-function-declaration
$ /c/u/br2/bin/llvm-readelf -r test.o                                           
...
0000000000000041  0000000600000004 R_X86_64_PLT32         0000000000000000 .text.foo - 4
…
With -fpie:
$ /c/u/br2/bin/clang main.c -fno-semantic-interposition -c -fpie -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=default -Wno-implicit-function-declaration
$ /c/u/br2/bin/llvm-readelf -r test.o
...
0000000000000041  0000000a00000004 R_X86_64_PLT32         0000000000000000 foo - 4
...

$ /c/u/br2/bin/clang main.c -fsemantic-interposition -c -fpie -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=default -Wno-implicit-function-declaration
$ /c/u/br2/bin/llvm-readelf -r test.o                                           
...
0000000000000041  0000000a00000004 R_X86_64_PLT32         0000000000000000 foo - 4
...

$ /c/u/br2/bin/clang main.c -fno-semantic-interposition -c -fpie -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=hidden -Wno-implicit-function-declaration
$ /c/u/br2/bin/llvm-readelf -r test.o                                           
...
0000000000000041  0000000a00000004 R_X86_64_PLT32         0000000000000000 foo - 4
...

$ /c/u/br2/bin/clang main.c -fsemantic-interposition -c -fpie -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=hidden -Wno-implicit-function-declaration
$ /c/u/br2/bin/llvm-readelf -r test.o                                           
...
0000000000000041  0000000a00000004 R_X86_64_PLT32         0000000000000000 foo - 4
...
So, clang's behavior has changed so that --wrap no longer wraps symbol definitions for default symbols + -fpic + -fno-semantic-interposition (-fno-semantic-interposition is the default); however, you can restore the old behavior via -fsemantic-interposition. For hidden symbols + -fpic --wrap no longer wraps symbol definitions and there is no way to restore the old definition wrapping behavior.

Sorry for replying to my own post - I have realised that I have made a mistake -fno-semantic-interposition is not the default for -fpic so there has only been a change in behaviour for hidden symbols:

$ /c/u/br2/bin/clang main.c -c -fpic -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=hidden -Wno-implicit-function-declaration
bdunbobbin@BENDB-W10-3 MINGW64 /c/temp/interpos
$ /c/u/br2/bin/llvm-readelf -r test.o
…                     
0000000000000041  0000000600000004 R_X86_64_PLT32         0000000000000000 .text.foo - 4
…

$ /c/u/br2/bin/clang main.c -c -fpic -target x86_64-linux-gnu -o test.o -ffunction-sections -fdata-sections -fvisibility=default -Wno-implicit-function-declaration
$ /c/u/br2/bin/llvm-readelf -r test.o
…
0000000000000041  0000000a00000004 R_X86_64_PLT32         0000000000000000 foo - 4
...

In D73230#2207477, @bd1976llvm wrote:

So, clang's behavior has changed so that --wrap no longer wraps symbol definitions for default symbols + -fpic + -fno-semantic-interposition (-fno-semantic-interposition is the default); however, you can restore the old behavior via -fsemantic-interposition. For hidden symbols + -fpic --wrap no longer wraps symbol definitions and there is no way to restore the old definition wrapping behavior.

I think the summary is correct. -fvisibility=hidden nullifies -fsemantic-interposition when the definition is available in the same translation unit. Given how GCC and GNU ld handle/document it, I'd say the previous hidden clang behavior working with -Wl,--wrap=foo is accidental rather than intentional. If you don't pass explicit -fsemantic-interposition, in -fPIC mode clang can freely inline foo into call sites, which will also defeat the intended -Wl,--wrap=foo behavior.

Any undefined reference to symbol will be resolved to "__wrap_symbol".

I think the reasonably portable approach making the wrapping scheme work is __attribute__((weak)). An alternative is to move the definitions to a separate translation unit (it does not work with -r or GCC LTO, though).

In D73230#2207634, @MaskRay wrote:

In D73230#2207477, @bd1976llvm wrote:

So, clang's behavior has changed so that --wrap no longer wraps symbol definitions for default symbols + -fpic + -fno-semantic-interposition (-fno-semantic-interposition is the default); however, you can restore the old behavior via -fsemantic-interposition. For hidden symbols + -fpic --wrap no longer wraps symbol definitions and there is no way to restore the old definition wrapping behavior.

I think the summary is correct. -fvisibility=hidden nullifies -fsemantic-interposition when the definition is available in the same translation unit. Given how GCC and GNU ld handle/document it, I'd say the previous hidden clang behavior working with -Wl,--wrap=foo is accidental rather than intentional. If you don't pass explicit -fsemantic-interposition, in -fPIC mode clang can freely inline foo into call sites, which will also defeat the intended -Wl,--wrap=foo behavior.

Any undefined reference to symbol will be resolved to "__wrap_symbol".

I think the reasonably portable approach making the wrapping scheme work is __attribute__((weak)). An alternative is to move the definitions to a separate translation unit (it does not work with -r or GCC LTO, though).

.. but now we have this difference in behaviour for -normal vs flto links:

ben@ben-VirtualBox:~/tests/wrap$ cat smaller.c 
void __wrap_foo () {
	puts ("__wrap_foo");
	__real_foo();
}

void foo () { puts("foo()"); }

int main() { foo(); }
ben@ben-VirtualBox:~/tests/wrap$ clang smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -o old.elf -fvisibility=hidden -fuse-ld=lld
ben@ben-VirtualBox:~/tests/wrap$ ./old.elf
__wrap_foo
foo()
ben@ben-VirtualBox:~/tests/wrap$ clang smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -o old_lto.elf -fvisibility=hidden -fuse-ld=lld -flto
ben@ben-VirtualBox:~/tests/wrap$ ./old_lto.elf
__wrap_foo
foo()
ben@ben-VirtualBox:~/tests/wrap$ ~/u/build/bin/clang smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -o new.elf -fvisibility=hidden -fuse-ld=lld
ben@ben-VirtualBox:~/tests/wrap$ ./new.elf
foo()
ben@ben-VirtualBox:~/tests/wrap$ ~/u/build/bin/clang smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -o new_lto.elf -fvisibility=hidden -fuse-ld=lld -flto
ben@ben-VirtualBox:~/tests/wrap$ ./new_lto.elf
__wrap_foo
foo()

... and the same will be true for default symbols + -fpic when we enable -fno-semantic-interposition by default for -fpic :(

In D73230#2207735, @bd1976llvm wrote:
In D73230#2207634, @MaskRay wrote:

In D73230#2207477, @bd1976llvm wrote:

So, clang's behavior has changed so that --wrap no longer wraps symbol definitions for default symbols + -fpic + -fno-semantic-interposition (-fno-semantic-interposition is the default); however, you can restore the old behavior via -fsemantic-interposition. For hidden symbols + -fpic --wrap no longer wraps symbol definitions and there is no way to restore the old definition wrapping behavior.

I think the summary is correct. -fvisibility=hidden nullifies -fsemantic-interposition when the definition is available in the same translation unit. Given how GCC and GNU ld handle/document it, I'd say the previous hidden clang behavior working with -Wl,--wrap=foo is accidental rather than intentional. If you don't pass explicit -fsemantic-interposition, in -fPIC mode clang can freely inline foo into call sites, which will also defeat the intended -Wl,--wrap=foo behavior.

Any undefined reference to symbol will be resolved to "__wrap_symbol".

I think the reasonably portable approach making the wrapping scheme work is __attribute__((weak)). An alternative is to move the definitions to a separate translation unit (it does not work with -r or GCC LTO, though).

.. but now we have this difference in behaviour for -normal vs flto links:
ben@ben-VirtualBox:~/tests/wrap$ cat smaller.c 
void __wrap_foo () {
	puts ("__wrap_foo");
	__real_foo();
}

void foo () { puts("foo()"); }

int main() { foo(); }
ben@ben-VirtualBox:~/tests/wrap$ clang smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -o old.elf -fvisibility=hidden -fuse-ld=lld
ben@ben-VirtualBox:~/tests/wrap$ ./old.elf
__wrap_foo
foo()
ben@ben-VirtualBox:~/tests/wrap$ clang smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -o old_lto.elf -fvisibility=hidden -fuse-ld=lld -flto
ben@ben-VirtualBox:~/tests/wrap$ ./old_lto.elf
__wrap_foo
foo()
ben@ben-VirtualBox:~/tests/wrap$ ~/u/build/bin/clang smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -o new.elf -fvisibility=hidden -fuse-ld=lld
ben@ben-VirtualBox:~/tests/wrap$ ./new.elf
foo()
ben@ben-VirtualBox:~/tests/wrap$ ~/u/build/bin/clang smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -o new_lto.elf -fvisibility=hidden -fuse-ld=lld -flto
ben@ben-VirtualBox:~/tests/wrap$ ./new_lto.elf
__wrap_foo
foo()
... and the same will be true for default symbols + -fpic when we enable -fno-semantic-interposition by default for -fpic :(

I think the non-LTO vs Full LTO difference is due to the way we implement --wrap for LTO: D33621.
It adds WeakAny linkage which obtains wrapping behavior.

clang -fuse-ld=lld smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -fvisibility=hidden -flto -Wl,--save-temps

% llvm-dis < a.out.0.0.preopt.bc
...
define weak hidden void @foo() #0 {  # Note the weakany linkage
...

I think the only way making --wrap sufficiently reliable is to communicate --wrap to the LTO code generation. It was considered unworthy at the implementation time https://bugs.llvm.org/show_bug.cgi?id=33145#c7

To make the non-LTO case behave like the LTO case, add __attribute__((weak))

In D73230#2207837, @MaskRay wrote:
In D73230#2207735, @bd1976llvm wrote:
In D73230#2207634, @MaskRay wrote:

In D73230#2207477, @bd1976llvm wrote:

So, clang's behavior has changed so that --wrap no longer wraps symbol definitions for default symbols + -fpic + -fno-semantic-interposition (-fno-semantic-interposition is the default); however, you can restore the old behavior via -fsemantic-interposition. For hidden symbols + -fpic --wrap no longer wraps symbol definitions and there is no way to restore the old definition wrapping behavior.

I think the summary is correct. -fvisibility=hidden nullifies -fsemantic-interposition when the definition is available in the same translation unit. Given how GCC and GNU ld handle/document it, I'd say the previous hidden clang behavior working with -Wl,--wrap=foo is accidental rather than intentional. If you don't pass explicit -fsemantic-interposition, in -fPIC mode clang can freely inline foo into call sites, which will also defeat the intended -Wl,--wrap=foo behavior.

Any undefined reference to symbol will be resolved to "__wrap_symbol".

I think the reasonably portable approach making the wrapping scheme work is __attribute__((weak)). An alternative is to move the definitions to a separate translation unit (it does not work with -r or GCC LTO, though).

.. but now we have this difference in behaviour for -normal vs flto links:
ben@ben-VirtualBox:~/tests/wrap$ cat smaller.c 
void __wrap_foo () {
	puts ("__wrap_foo");
	__real_foo();
}

void foo () { puts("foo()"); }

int main() { foo(); }
ben@ben-VirtualBox:~/tests/wrap$ clang smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -o old.elf -fvisibility=hidden -fuse-ld=lld
ben@ben-VirtualBox:~/tests/wrap$ ./old.elf
__wrap_foo
foo()
ben@ben-VirtualBox:~/tests/wrap$ clang smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -o old_lto.elf -fvisibility=hidden -fuse-ld=lld -flto
ben@ben-VirtualBox:~/tests/wrap$ ./old_lto.elf
__wrap_foo
foo()
ben@ben-VirtualBox:~/tests/wrap$ ~/u/build/bin/clang smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -o new.elf -fvisibility=hidden -fuse-ld=lld
ben@ben-VirtualBox:~/tests/wrap$ ./new.elf
foo()
ben@ben-VirtualBox:~/tests/wrap$ ~/u/build/bin/clang smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -o new_lto.elf -fvisibility=hidden -fuse-ld=lld -flto
ben@ben-VirtualBox:~/tests/wrap$ ./new_lto.elf
__wrap_foo
foo()
... and the same will be true for default symbols + -fpic when we enable -fno-semantic-interposition by default for -fpic :(
I think the non-LTO vs Full LTO difference is due to the way we implement --wrap for LTO: D33621.
It adds WeakAny linkage which obtains wrapping behavior.

clang -fuse-ld=lld smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -fvisibility=hidden -flto -Wl,--save-temps
% llvm-dis < a.out.0.0.preopt.bc
...
define weak hidden void @foo() #0 {  # Note the weakany linkage
...
I think the only way making --wrap sufficiently reliable is to communicate --wrap to the LTO code generation. It was considered unworthy at the implementation time https://bugs.llvm.org/show_bug.cgi?id=33145#c7

To make the non-LTO case behave like the LTO case, add __attribute__((weak))

I had come to the same conclusion and I agree with your assessment of what is needed to improve the LTO behavior.

Another way of obtaining consistent behavior would be to adjust the current canBenefitFromLocalAlias():

 bool GlobalValue::canBenefitFromLocalAlias() const {
   // See AsmPrinter::getSymbolPreferLocal().
-  return GlobalObject::isExternalLinkage(getLinkage()) && !isDeclaration() &&
+  return hasDefaultVisibility() && GlobalObject::isExternalLinkage(getLinkage()) && !isDeclaration() &&
          !isa<GlobalIFunc>(this) && !hasComdat();
 }

Doing this would:

Make the code read a bit better (it was not clear to me without going back to the code reviews why this function doesn't consider visibility).
Remove the introduced difference in behavior for --wrap and hidden definitions.
Remove the difference in --wrap behavior between LTO and normal builds.
Improve the size of object files with hidden symbols (and most symbols are hidden now IIUC) by preventing the need for superfluous STT_SECTION symbols.

In D73230#2208462, @bd1976llvm wrote:
In D73230#2207837, @MaskRay wrote:
In D73230#2207735, @bd1976llvm wrote:
In D73230#2207634, @MaskRay wrote:

In D73230#2207477, @bd1976llvm wrote:

So, clang's behavior has changed so that --wrap no longer wraps symbol definitions for default symbols + -fpic + -fno-semantic-interposition (-fno-semantic-interposition is the default); however, you can restore the old behavior via -fsemantic-interposition. For hidden symbols + -fpic --wrap no longer wraps symbol definitions and there is no way to restore the old definition wrapping behavior.

I think the summary is correct. -fvisibility=hidden nullifies -fsemantic-interposition when the definition is available in the same translation unit. Given how GCC and GNU ld handle/document it, I'd say the previous hidden clang behavior working with -Wl,--wrap=foo is accidental rather than intentional. If you don't pass explicit -fsemantic-interposition, in -fPIC mode clang can freely inline foo into call sites, which will also defeat the intended -Wl,--wrap=foo behavior.

Any undefined reference to symbol will be resolved to "__wrap_symbol".

I think the reasonably portable approach making the wrapping scheme work is __attribute__((weak)). An alternative is to move the definitions to a separate translation unit (it does not work with -r or GCC LTO, though).

.. but now we have this difference in behaviour for -normal vs flto links:
ben@ben-VirtualBox:~/tests/wrap$ cat smaller.c 
void __wrap_foo () {
	puts ("__wrap_foo");
	__real_foo();
}

void foo () { puts("foo()"); }

int main() { foo(); }
ben@ben-VirtualBox:~/tests/wrap$ clang smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -o old.elf -fvisibility=hidden -fuse-ld=lld
ben@ben-VirtualBox:~/tests/wrap$ ./old.elf
__wrap_foo
foo()
ben@ben-VirtualBox:~/tests/wrap$ clang smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -o old_lto.elf -fvisibility=hidden -fuse-ld=lld -flto
ben@ben-VirtualBox:~/tests/wrap$ ./old_lto.elf
__wrap_foo
foo()
ben@ben-VirtualBox:~/tests/wrap$ ~/u/build/bin/clang smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -o new.elf -fvisibility=hidden -fuse-ld=lld
ben@ben-VirtualBox:~/tests/wrap$ ./new.elf
foo()
ben@ben-VirtualBox:~/tests/wrap$ ~/u/build/bin/clang smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -o new_lto.elf -fvisibility=hidden -fuse-ld=lld -flto
ben@ben-VirtualBox:~/tests/wrap$ ./new_lto.elf
__wrap_foo
foo()
... and the same will be true for default symbols + -fpic when we enable -fno-semantic-interposition by default for -fpic :(
I think the non-LTO vs Full LTO difference is due to the way we implement --wrap for LTO: D33621.
It adds WeakAny linkage which obtains wrapping behavior.

clang -fuse-ld=lld smaller.c -Wno-implicit-function-declaration -fpic -ffunction-sections -Wl,--wrap=foo -fvisibility=hidden -flto -Wl,--save-temps
% llvm-dis < a.out.0.0.preopt.bc
...
define weak hidden void @foo() #0 {  # Note the weakany linkage
...
I think the only way making --wrap sufficiently reliable is to communicate --wrap to the LTO code generation. It was considered unworthy at the implementation time https://bugs.llvm.org/show_bug.cgi?id=33145#c7

To make the non-LTO case behave like the LTO case, add __attribute__((weak))
I had come to the same conclusion and I agree with your assessment of what is needed to improve the LTO behavior.

Another way of obtaining consistent behavior would be to adjust the current canBenefitFromLocalAlias():
 bool GlobalValue::canBenefitFromLocalAlias() const {
   // See AsmPrinter::getSymbolPreferLocal().
-  return GlobalObject::isExternalLinkage(getLinkage()) && !isDeclaration() &&
+  return hasDefaultVisibility() && GlobalObject::isExternalLinkage(getLinkage()) && !isDeclaration() &&
          !isa<GlobalIFunc>(this) && !hasComdat();
 }
Doing this would:

Make the code read a bit better (it was not clear to me without going back to the code reviews why this function doesn't consider visibility).

Remove the introduced difference in behavior for --wrap and hidden definitions.

Remove the difference in --wrap behavior between LTO and normal builds.

Improve the size of object files with hidden symbols (and most symbols are hidden now IIUC) by preventing the need for superfluous STT_SECTION symbols.

The proposed canBenefitFromLocalAlias change looks good to me. Mind sending a patch? I think 4 is the important one. I agree that it happens to provide 2 and 3 here but the --wrap behavior still looks unreliable to me. Glad that you find a solution addressing your problems:)

In D73230#2208470, @MaskRay wrote:

The proposed canBenefitFromLocalAlias change looks good to me. Mind sending a patch?

Incoming just struggling with the testing a bit.

bd1976llvm mentioned this in D85782: [X86][ELF] Prefer lowering MC_GlobalAddress operands to .Lfoo$local only for STV_DEFAULT globals.Aug 11 2020, 2:10 PM

In D73230#2208470, @MaskRay wrote:

The proposed canBenefitFromLocalAlias change looks good to me. Mind sending a patch?

I have put up https://reviews.llvm.org/D85782.

Ben Dunbobbin <Ben.Dunbobbin@sony.com> mentioned this in rG4cb016cd2d84: [X86][ELF] Prefer lowering MC_GlobalAddress operands to .Lfoo$local for….Aug 13 2020, 4:09 PM

MaskRay mentioned this in D101872: [AArch64][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local.May 4 2021, 4:43 PM

MaskRay mentioned this in D101875: [RISCV] Prefer to lower MC_GlobalAddress operands to .Lfoo$local.May 4 2021, 5:25 PM

MaskRay mentioned this in rG6a2850f3fc24: [AArch64][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local.May 7 2021, 9:44 AM

MaskRay mentioned this in rGec27c5f17044: [RISCV] Prefer to lower MC_GlobalAddress operands to .Lfoo$local.May 11 2021, 11:29 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86MCInstLower.cpp

4 lines

test/

CodeGen/

X86/

36 lines

4 lines

2 lines

12 lines

164 lines

2 lines

2 lines

2 lines

2 lines

8 lines

Diff 241621

llvm/lib/Target/X86/X86MCInstLower.cpp

	Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines

	MachineModuleInfoMachO &X86MCInstLower::getMachOMMI() const {			MachineModuleInfoMachO &X86MCInstLower::getMachOMMI() const {
	return MF.getMMI().getObjFileInfo<MachineModuleInfoMachO>();			return MF.getMMI().getObjFileInfo<MachineModuleInfoMachO>();
	}			}

	/// GetSymbolFromOperand - Lower an MO_GlobalAddress or MO_ExternalSymbol			/// GetSymbolFromOperand - Lower an MO_GlobalAddress or MO_ExternalSymbol
	/// operand to an MCSymbol.			/// operand to an MCSymbol.
	MCSymbol *X86MCInstLower::GetSymbolFromOperand(const MachineOperand &MO) const {			MCSymbol *X86MCInstLower::GetSymbolFromOperand(const MachineOperand &MO) const {
				const Triple &TT = TM.getTargetTriple();
				if (MO.isGlobal() && TT.isOSBinFormatELF())
				return AsmPrinter.getSymbolPreferLocal(*MO.getGlobal());

	const DataLayout &DL = MF.getDataLayout();			const DataLayout &DL = MF.getDataLayout();
	assert((MO.isGlobal() \|\| MO.isSymbol() \|\| MO.isMBB()) &&			assert((MO.isGlobal() \|\| MO.isSymbol() \|\| MO.isMBB()) &&
	"Isn't a symbol reference");			"Isn't a symbol reference");

	MCSymbol *Sym = nullptr;			MCSymbol *Sym = nullptr;
	SmallString<128> Name;			SmallString<128> Name;
	StringRef Suffix;			StringRef Suffix;

	▲ Show 20 Lines • Show All 2,493 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/code-model-elf.ll

	Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
	; LARGE-PIC-NEXT: addq %rcx, %rax			; LARGE-PIC-NEXT: addq %rcx, %rax
	; LARGE-PIC-NEXT: retq			; LARGE-PIC-NEXT: retq
	ret i32* getelementptr inbounds ([10 x i32], [10 x i32]* @static_data, i64 0, i64 0)			ret i32* getelementptr inbounds ([10 x i32], [10 x i32]* @static_data, i64 0, i64 0)
	}			}

	define dso_local i32* @lea_global_data() #0 {			define dso_local i32* @lea_global_data() #0 {
	; SMALL-STATIC-LABEL: lea_global_data:			; SMALL-STATIC-LABEL: lea_global_data:
	; SMALL-STATIC: # %bb.0:			; SMALL-STATIC: # %bb.0:
	; SMALL-STATIC-NEXT: movl $global_data, %eax			; SMALL-STATIC-NEXT: movl $.Lglobal_data$local, %eax
	; SMALL-STATIC-NEXT: retq			; SMALL-STATIC-NEXT: retq
	;			;
	; MEDIUM-STATIC-LABEL: lea_global_data:			; MEDIUM-STATIC-LABEL: lea_global_data:
	; MEDIUM-STATIC: # %bb.0:			; MEDIUM-STATIC: # %bb.0:
	; MEDIUM-STATIC-NEXT: movabsq $global_data, %rax			; MEDIUM-STATIC-NEXT: movabsq $.Lglobal_data$local, %rax
	; MEDIUM-STATIC-NEXT: retq			; MEDIUM-STATIC-NEXT: retq
	;			;
	; LARGE-STATIC-LABEL: lea_global_data:			; LARGE-STATIC-LABEL: lea_global_data:
	; LARGE-STATIC: # %bb.0:			; LARGE-STATIC: # %bb.0:
	; LARGE-STATIC-NEXT: movabsq $global_data, %rax			; LARGE-STATIC-NEXT: movabsq $.Lglobal_data$local, %rax
	; LARGE-STATIC-NEXT: retq			; LARGE-STATIC-NEXT: retq
	;			;
	; SMALL-PIC-LABEL: lea_global_data:			; SMALL-PIC-LABEL: lea_global_data:
	; SMALL-PIC: # %bb.0:			; SMALL-PIC: # %bb.0:
	; SMALL-PIC-NEXT: leaq global_data(%rip), %rax			; SMALL-PIC-NEXT: leaq .Lglobal_data$local(%rip), %rax
	; SMALL-PIC-NEXT: retq			; SMALL-PIC-NEXT: retq
	;			;
	; MEDIUM-PIC-LABEL: lea_global_data:			; MEDIUM-PIC-LABEL: lea_global_data:
	; MEDIUM-PIC: # %bb.0:			; MEDIUM-PIC: # %bb.0:
	; MEDIUM-PIC-NEXT: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rcx			; MEDIUM-PIC-NEXT: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rcx
	; MEDIUM-PIC-NEXT: movabsq $global_data@GOTOFF, %rax			; MEDIUM-PIC-NEXT: movabsq $.Lglobal_data$local@GOTOFF, %rax
	; MEDIUM-PIC-NEXT: addq %rcx, %rax			; MEDIUM-PIC-NEXT: addq %rcx, %rax
	; MEDIUM-PIC-NEXT: retq			; MEDIUM-PIC-NEXT: retq
	;			;
	; LARGE-PIC-LABEL: lea_global_data:			; LARGE-PIC-LABEL: lea_global_data:
	; LARGE-PIC: # %bb.0:			; LARGE-PIC: # %bb.0:
	; LARGE-PIC-NEXT: .L1$pb:			; LARGE-PIC-NEXT: .L1$pb:
	; LARGE-PIC-NEXT: leaq .L1$pb(%rip), %rax			; LARGE-PIC-NEXT: leaq .L1$pb(%rip), %rax
	; LARGE-PIC-NEXT: movabsq $_GLOBAL_OFFSET_TABLE_-.L1$pb, %rcx			; LARGE-PIC-NEXT: movabsq $_GLOBAL_OFFSET_TABLE_-.L1$pb, %rcx
	; LARGE-PIC-NEXT: addq %rax, %rcx			; LARGE-PIC-NEXT: addq %rax, %rcx
	; LARGE-PIC-NEXT: movabsq $global_data@GOTOFF, %rax			; LARGE-PIC-NEXT: movabsq $.Lglobal_data$local@GOTOFF, %rax
	; LARGE-PIC-NEXT: addq %rcx, %rax			; LARGE-PIC-NEXT: addq %rcx, %rax
	; LARGE-PIC-NEXT: retq			; LARGE-PIC-NEXT: retq
	ret i32* getelementptr inbounds ([10 x i32], [10 x i32]* @global_data, i64 0, i64 0)			ret i32* getelementptr inbounds ([10 x i32], [10 x i32]* @global_data, i64 0, i64 0)
	}			}

	define dso_local i32* @lea_extern_data() #0 {			define dso_local i32* @lea_extern_data() #0 {
	; SMALL-STATIC-LABEL: lea_extern_data:			; SMALL-STATIC-LABEL: lea_extern_data:
	; SMALL-STATIC: # %bb.0:			; SMALL-STATIC: # %bb.0:
	Show All 30 Lines
	; LARGE-PIC-NEXT: movq (%rcx,%rax), %rax			; LARGE-PIC-NEXT: movq (%rcx,%rax), %rax
	; LARGE-PIC-NEXT: retq			; LARGE-PIC-NEXT: retq
	ret i32* getelementptr inbounds ([10 x i32], [10 x i32]* @extern_data, i64 0, i64 0)			ret i32* getelementptr inbounds ([10 x i32], [10 x i32]* @extern_data, i64 0, i64 0)
	}			}

	define dso_local i32 @load_global_data() #0 {			define dso_local i32 @load_global_data() #0 {
	; SMALL-STATIC-LABEL: load_global_data:			; SMALL-STATIC-LABEL: load_global_data:
	; SMALL-STATIC: # %bb.0:			; SMALL-STATIC: # %bb.0:
	; SMALL-STATIC-NEXT: movl global_data+8(%rip), %eax			; SMALL-STATIC-NEXT: movl .Lglobal_data$local+8(%rip), %eax
	; SMALL-STATIC-NEXT: retq			; SMALL-STATIC-NEXT: retq
	;			;
	; MEDIUM-STATIC-LABEL: load_global_data:			; MEDIUM-STATIC-LABEL: load_global_data:
	; MEDIUM-STATIC: # %bb.0:			; MEDIUM-STATIC: # %bb.0:
	; MEDIUM-STATIC-NEXT: movabsq $global_data, %rax			; MEDIUM-STATIC-NEXT: movabsq $.Lglobal_data$local, %rax
	; MEDIUM-STATIC-NEXT: movl 8(%rax), %eax			; MEDIUM-STATIC-NEXT: movl 8(%rax), %eax
	; MEDIUM-STATIC-NEXT: retq			; MEDIUM-STATIC-NEXT: retq
	;			;
	; LARGE-STATIC-LABEL: load_global_data:			; LARGE-STATIC-LABEL: load_global_data:
	; LARGE-STATIC: # %bb.0:			; LARGE-STATIC: # %bb.0:
	; LARGE-STATIC-NEXT: movabsq $global_data, %rax			; LARGE-STATIC-NEXT: movabsq $.Lglobal_data$local, %rax
	; LARGE-STATIC-NEXT: movl 8(%rax), %eax			; LARGE-STATIC-NEXT: movl 8(%rax), %eax
	; LARGE-STATIC-NEXT: retq			; LARGE-STATIC-NEXT: retq
	;			;
	; SMALL-PIC-LABEL: load_global_data:			; SMALL-PIC-LABEL: load_global_data:
	; SMALL-PIC: # %bb.0:			; SMALL-PIC: # %bb.0:
	; SMALL-PIC-NEXT: movl global_data+8(%rip), %eax			; SMALL-PIC-NEXT: movl .Lglobal_data$local+8(%rip), %eax
	; SMALL-PIC-NEXT: retq			; SMALL-PIC-NEXT: retq
	;			;
	; MEDIUM-PIC-LABEL: load_global_data:			; MEDIUM-PIC-LABEL: load_global_data:
	; MEDIUM-PIC: # %bb.0:			; MEDIUM-PIC: # %bb.0:
	; MEDIUM-PIC-NEXT: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rax			; MEDIUM-PIC-NEXT: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rax
	; MEDIUM-PIC-NEXT: movabsq $global_data@GOTOFF, %rcx			; MEDIUM-PIC-NEXT: movabsq $.Lglobal_data$local@GOTOFF, %rcx
	; MEDIUM-PIC-NEXT: movl 8(%rax,%rcx), %eax			; MEDIUM-PIC-NEXT: movl 8(%rax,%rcx), %eax
	; MEDIUM-PIC-NEXT: retq			; MEDIUM-PIC-NEXT: retq
	;			;
	; LARGE-PIC-LABEL: load_global_data:			; LARGE-PIC-LABEL: load_global_data:
	; LARGE-PIC: # %bb.0:			; LARGE-PIC: # %bb.0:
	; LARGE-PIC-NEXT: .L3$pb:			; LARGE-PIC-NEXT: .L3$pb:
	; LARGE-PIC-NEXT: leaq .L3$pb(%rip), %rax			; LARGE-PIC-NEXT: leaq .L3$pb(%rip), %rax
	; LARGE-PIC-NEXT: movabsq $_GLOBAL_OFFSET_TABLE_-.L3$pb, %rcx			; LARGE-PIC-NEXT: movabsq $_GLOBAL_OFFSET_TABLE_-.L3$pb, %rcx
	; LARGE-PIC-NEXT: addq %rax, %rcx			; LARGE-PIC-NEXT: addq %rax, %rcx
	; LARGE-PIC-NEXT: movabsq $global_data@GOTOFF, %rax			; LARGE-PIC-NEXT: movabsq $.Lglobal_data$local@GOTOFF, %rax
	; LARGE-PIC-NEXT: movl 8(%rcx,%rax), %eax			; LARGE-PIC-NEXT: movl 8(%rcx,%rax), %eax
	; LARGE-PIC-NEXT: retq			; LARGE-PIC-NEXT: retq
	%rv = load i32, i32* getelementptr inbounds ([10 x i32], [10 x i32]* @global_data, i64 0, i64 2)			%rv = load i32, i32* getelementptr inbounds ([10 x i32], [10 x i32]* @global_data, i64 0, i64 2)
	ret i32 %rv			ret i32 %rv
	}			}

	define dso_local i32 @load_extern_data() #0 {			define dso_local i32 @load_extern_data() #0 {
	; SMALL-STATIC-LABEL: load_extern_data:			; SMALL-STATIC-LABEL: load_extern_data:
	▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	; LARGE-PIC-NEXT: addq %rcx, %rax			; LARGE-PIC-NEXT: addq %rcx, %rax
	; LARGE-PIC-NEXT: retq			; LARGE-PIC-NEXT: retq
	ret void ()* @static_fn			ret void ()* @static_fn
	}			}

	define dso_local void ()* @lea_global_fn() #0 {			define dso_local void ()* @lea_global_fn() #0 {
	; SMALL-STATIC-LABEL: lea_global_fn:			; SMALL-STATIC-LABEL: lea_global_fn:
	; SMALL-STATIC: # %bb.0:			; SMALL-STATIC: # %bb.0:
	; SMALL-STATIC-NEXT: movl $global_fn, %eax			; SMALL-STATIC-NEXT: movl $.Lglobal_fn$local, %eax
	; SMALL-STATIC-NEXT: retq			; SMALL-STATIC-NEXT: retq
	;			;
	; MEDIUM-STATIC-LABEL: lea_global_fn:			; MEDIUM-STATIC-LABEL: lea_global_fn:
	; MEDIUM-STATIC: # %bb.0:			; MEDIUM-STATIC: # %bb.0:
	; MEDIUM-STATIC-NEXT: movabsq $global_fn, %rax			; MEDIUM-STATIC-NEXT: movabsq $.Lglobal_fn$local, %rax
	; MEDIUM-STATIC-NEXT: retq			; MEDIUM-STATIC-NEXT: retq
	;			;
	; LARGE-STATIC-LABEL: lea_global_fn:			; LARGE-STATIC-LABEL: lea_global_fn:
	; LARGE-STATIC: # %bb.0:			; LARGE-STATIC: # %bb.0:
	; LARGE-STATIC-NEXT: movabsq $global_fn, %rax			; LARGE-STATIC-NEXT: movabsq $.Lglobal_fn$local, %rax
	; LARGE-STATIC-NEXT: retq			; LARGE-STATIC-NEXT: retq
	;			;
	; SMALL-PIC-LABEL: lea_global_fn:			; SMALL-PIC-LABEL: lea_global_fn:
	; SMALL-PIC: # %bb.0:			; SMALL-PIC: # %bb.0:
	; SMALL-PIC-NEXT: leaq global_fn(%rip), %rax			; SMALL-PIC-NEXT: leaq .Lglobal_fn$local(%rip), %rax
	; SMALL-PIC-NEXT: retq			; SMALL-PIC-NEXT: retq
	;			;
	; MEDIUM-PIC-LABEL: lea_global_fn:			; MEDIUM-PIC-LABEL: lea_global_fn:
	; MEDIUM-PIC: # %bb.0:			; MEDIUM-PIC: # %bb.0:
	; MEDIUM-PIC-NEXT: movabsq $global_fn, %rax			; MEDIUM-PIC-NEXT: movabsq $.Lglobal_fn$local, %rax
	; MEDIUM-PIC-NEXT: retq			; MEDIUM-PIC-NEXT: retq
	;			;
	; LARGE-PIC-LABEL: lea_global_fn:			; LARGE-PIC-LABEL: lea_global_fn:
	; LARGE-PIC: # %bb.0:			; LARGE-PIC: # %bb.0:
	; LARGE-PIC-NEXT: .L8$pb:			; LARGE-PIC-NEXT: .L8$pb:
	; LARGE-PIC-NEXT: leaq .L8$pb(%rip), %rax			; LARGE-PIC-NEXT: leaq .L8$pb(%rip), %rax
	; LARGE-PIC-NEXT: movabsq $_GLOBAL_OFFSET_TABLE_-.L8$pb, %rcx			; LARGE-PIC-NEXT: movabsq $_GLOBAL_OFFSET_TABLE_-.L8$pb, %rcx
	; LARGE-PIC-NEXT: addq %rax, %rcx			; LARGE-PIC-NEXT: addq %rax, %rcx
	; LARGE-PIC-NEXT: movabsq $global_fn@GOTOFF, %rax			; LARGE-PIC-NEXT: movabsq $.Lglobal_fn$local@GOTOFF, %rax
	; LARGE-PIC-NEXT: addq %rcx, %rax			; LARGE-PIC-NEXT: addq %rcx, %rax
	; LARGE-PIC-NEXT: retq			; LARGE-PIC-NEXT: retq
	ret void ()* @global_fn			ret void ()* @global_fn
	}			}

	define dso_local void ()* @lea_extern_fn() #0 {			define dso_local void ()* @lea_extern_fn() #0 {
	; SMALL-STATIC-LABEL: lea_extern_fn:			; SMALL-STATIC-LABEL: lea_extern_fn:
	; SMALL-STATIC: # %bb.0:			; SMALL-STATIC: # %bb.0:
	▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/emutls.ll

	Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	; X32-NEXT: retl			; X32-NEXT: retl

	entry:			entry:
	ret i32* @i3			ret i32* @i3
	}			}

	define i32 @f7() {			define i32 @f7() {
	; X32-LABEL: f7:			; X32-LABEL: f7:
	; X32: movl $__emutls_v.i4, (%esp)			; X32: movl $.L__emutls_v.i4$local, (%esp)
	; X32-NEXT: calll __emutls_get_address			; X32-NEXT: calll __emutls_get_address
	; X32-NEXT: movl (%eax), %eax			; X32-NEXT: movl (%eax), %eax
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
	; X32-NEXT: .cfi_def_cfa_offset 4			; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl

	entry:			entry:
	%tmp1 = load i32, i32* @i4			%tmp1 = load i32, i32* @i4
	ret i32 %tmp1			ret i32 %tmp1
	}			}

	define i32* @f8() {			define i32* @f8() {
	; X32-LABEL: f8:			; X32-LABEL: f8:
	; X32: movl $__emutls_v.i4, (%esp)			; X32: movl $.L__emutls_v.i4$local, (%esp)
	; X32-NEXT: calll __emutls_get_address			; X32-NEXT: calll __emutls_get_address
	; X32-NEXT: addl $12, %esp			; X32-NEXT: addl $12, %esp
	; X32-NEXT: .cfi_def_cfa_offset 4			; X32-NEXT: .cfi_def_cfa_offset 4
	; X32-NEXT: retl			; X32-NEXT: retl

	entry:			entry:
	ret i32* @i4			ret i32* @i4
	}			}
	▲ Show 20 Lines • Show All 210 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fold-add-pcrel.ll

	; RUN: llc -mtriple=x86_64 -relocation-model=static < %s \| FileCheck --check-prefixes=CHECK,STATIC %s			; RUN: llc -mtriple=x86_64 -relocation-model=static < %s \| FileCheck --check-prefixes=CHECK,STATIC %s
	; RUN: llc -mtriple=x86_64 -relocation-model=pic < %s \| FileCheck --check-prefixes=CHECK,PIC %s			; RUN: llc -mtriple=x86_64 -relocation-model=pic < %s \| FileCheck --check-prefixes=CHECK,PIC %s
	; RUN: llc -mtriple=x86_64 -code-model=medium -relocation-model=static < %s \| FileCheck --check-prefixes=CHECK,MSTATIC %s			; RUN: llc -mtriple=x86_64 -code-model=medium -relocation-model=static < %s \| FileCheck --check-prefixes=CHECK,MSTATIC %s
	; RUN: llc -mtriple=x86_64 -code-model=medium -relocation-model=pic < %s \| FileCheck --check-prefixes=CHECK,MPIC %s			; RUN: llc -mtriple=x86_64 -code-model=medium -relocation-model=pic < %s \| FileCheck --check-prefixes=CHECK,MPIC %s

	@foo = dso_local global i32 0			@foo = internal global i32 0

	define dso_local i64 @zero() {			define dso_local i64 @zero() {
	; CHECK-LABEL: zero:			; CHECK-LABEL: zero:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; STATIC-NEXT: movl $foo, %eax			; STATIC-NEXT: movl $foo, %eax
	; STATIC-NEXT: retq			; STATIC-NEXT: retq
	; PIC-NEXT: leaq foo(%rip), %rax			; PIC-NEXT: leaq foo(%rip), %rax
	; PIC-NEXT: retq			; PIC-NEXT: retq
	Show All 27 Lines

llvm/test/CodeGen/X86/linux-preemption.ll

	Show All 34 Lines
	; CHECK: movq external_default_global@GOTPCREL(%rip), %rax			; CHECK: movq external_default_global@GOTPCREL(%rip), %rax
	; STATIC: movl $external_default_global, %eax			; STATIC: movl $external_default_global, %eax
	; CHECK32: movl external_default_global@GOT(%eax), %eax			; CHECK32: movl external_default_global@GOT(%eax), %eax

	@strong_local_global = dso_local global i32 42			@strong_local_global = dso_local global i32 42
	define i32* @get_strong_local_global() {			define i32* @get_strong_local_global() {
	ret i32* @strong_local_global			ret i32* @strong_local_global
	}			}
	; CHECK: leaq strong_local_global(%rip), %rax			; CHECK: leaq .Lstrong_local_global$local(%rip), %rax
	; STATIC: movl $strong_local_global, %eax			; STATIC: movl $.Lstrong_local_global$local, %eax
	; CHECK32: leal strong_local_global@GOTOFF(%eax), %eax			; CHECK32: leal .Lstrong_local_global$local@GOTOFF(%eax), %eax

	@weak_local_global = weak dso_local global i32 42			@weak_local_global = weak dso_local global i32 42
	define i32* @get_weak_local_global() {			define i32* @get_weak_local_global() {
	ret i32* @weak_local_global			ret i32* @weak_local_global
	}			}
	; CHECK: leaq weak_local_global(%rip), %rax			; CHECK: leaq weak_local_global(%rip), %rax
	; STATIC: movl $weak_local_global, %eax			; STATIC: movl $weak_local_global, %eax
	; CHECK32: leal weak_local_global@GOTOFF(%eax), %eax			; CHECK32: leal weak_local_global@GOTOFF(%eax), %eax
	▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines
	define dso_local void @strong_local_function() {			define dso_local void @strong_local_function() {
	ret void			ret void
	}			}
	define void()* @get_strong_local_function() {			define void()* @get_strong_local_function() {
	ret void()* @strong_local_function			ret void()* @strong_local_function
	}			}
	; COMMON: {{^}}strong_local_function:			; COMMON: {{^}}strong_local_function:
	; COMMON-NEXT .Lstrong_local_function:			; COMMON-NEXT .Lstrong_local_function:
	; CHECK: leaq strong_local_function(%rip), %rax			; CHECK: leaq .Lstrong_local_function$local(%rip), %rax
	; STATIC: movl $strong_local_function, %eax			; STATIC: movl $.Lstrong_local_function$local, %eax
	; CHECK32: leal strong_local_function@GOTOFF(%eax), %eax			; CHECK32: leal .Lstrong_local_function$local@GOTOFF(%eax), %eax

	define weak dso_local void @weak_local_function() {			define weak dso_local void @weak_local_function() {
	ret void			ret void
	}			}
	define void()* @get_weak_local_function() {			define void()* @get_weak_local_function() {
	ret void()* @weak_local_function			ret void()* @weak_local_function
	}			}
	; CHECK: leaq weak_local_function(%rip), %rax			; CHECK: leaq weak_local_function(%rip), %rax
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/oddsubvector.ll

	Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines

	@b = dso_local local_unnamed_addr global i32 0, align 4			@b = dso_local local_unnamed_addr global i32 0, align 4
	@c = dso_local local_unnamed_addr global [49 x i32] zeroinitializer, align 16			@c = dso_local local_unnamed_addr global [49 x i32] zeroinitializer, align 16
	@d = dso_local local_unnamed_addr global [49 x i32] zeroinitializer, align 16			@d = dso_local local_unnamed_addr global [49 x i32] zeroinitializer, align 16

	define void @PR42833() {			define void @PR42833() {
	; SSE2-LABEL: PR42833:			; SSE2-LABEL: PR42833:
	; SSE2: # %bb.0:			; SSE2: # %bb.0:
	; SSE2-NEXT: movdqa c+{{.*}}(%rip), %xmm1			; SSE2-NEXT: movdqa .Lc$local+{{.*}}(%rip), %xmm1
	; SSE2-NEXT: movdqa c+{{.*}}(%rip), %xmm0			; SSE2-NEXT: movdqa .Lc$local+{{.*}}(%rip), %xmm0
	; SSE2-NEXT: movd %xmm0, %eax			; SSE2-NEXT: movd %xmm0, %eax
	; SSE2-NEXT: addl {{.*}}(%rip), %eax			; SSE2-NEXT: addl {{.*}}(%rip), %eax
	; SSE2-NEXT: movd %eax, %xmm2			; SSE2-NEXT: movd %eax, %xmm2
	; SSE2-NEXT: movaps {{.*#+}} xmm3 = <u,1,1,1>			; SSE2-NEXT: movaps {{.*#+}} xmm3 = <u,1,1,1>
	; SSE2-NEXT: movss {{.*#+}} xmm3 = xmm2[0],xmm3[1,2,3]			; SSE2-NEXT: movss {{.*#+}} xmm3 = xmm2[0],xmm3[1,2,3]
	; SSE2-NEXT: movdqa %xmm0, %xmm4			; SSE2-NEXT: movdqa %xmm0, %xmm4
	; SSE2-NEXT: paddd %xmm3, %xmm4			; SSE2-NEXT: paddd %xmm3, %xmm4
	; SSE2-NEXT: pslld $23, %xmm3			; SSE2-NEXT: pslld $23, %xmm3
	; SSE2-NEXT: paddd {{.*}}(%rip), %xmm3			; SSE2-NEXT: paddd {{.*}}(%rip), %xmm3
	; SSE2-NEXT: cvttps2dq %xmm3, %xmm3			; SSE2-NEXT: cvttps2dq %xmm3, %xmm3
	; SSE2-NEXT: movdqa %xmm0, %xmm5			; SSE2-NEXT: movdqa %xmm0, %xmm5
	; SSE2-NEXT: pmuludq %xmm3, %xmm5			; SSE2-NEXT: pmuludq %xmm3, %xmm5
	; SSE2-NEXT: pshufd {{.*#+}} xmm5 = xmm5[0,2,2,3]			; SSE2-NEXT: pshufd {{.*#+}} xmm5 = xmm5[0,2,2,3]
	; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm3[1,1,3,3]			; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm3[1,1,3,3]
	; SSE2-NEXT: pshufd {{.*#+}} xmm6 = xmm0[1,1,3,3]			; SSE2-NEXT: pshufd {{.*#+}} xmm6 = xmm0[1,1,3,3]
	; SSE2-NEXT: pmuludq %xmm3, %xmm6			; SSE2-NEXT: pmuludq %xmm3, %xmm6
	; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm6[0,2,2,3]			; SSE2-NEXT: pshufd {{.*#+}} xmm3 = xmm6[0,2,2,3]
	; SSE2-NEXT: punpckldq {{.*#+}} xmm5 = xmm5[0],xmm3[0],xmm5[1],xmm3[1]			; SSE2-NEXT: punpckldq {{.*#+}} xmm5 = xmm5[0],xmm3[0],xmm5[1],xmm3[1]
	; SSE2-NEXT: movss {{.*#+}} xmm5 = xmm4[0],xmm5[1,2,3]			; SSE2-NEXT: movss {{.*#+}} xmm5 = xmm4[0],xmm5[1,2,3]
	; SSE2-NEXT: movdqa d+{{.*}}(%rip), %xmm3			; SSE2-NEXT: movdqa .Ld$local+{{.*}}(%rip), %xmm3
	; SSE2-NEXT: psubd %xmm1, %xmm3			; SSE2-NEXT: psubd %xmm1, %xmm3
	; SSE2-NEXT: paddd %xmm1, %xmm1			; SSE2-NEXT: paddd %xmm1, %xmm1
	; SSE2-NEXT: movdqa %xmm1, c+{{.*}}(%rip)			; SSE2-NEXT: movdqa %xmm1, .Lc$local+{{.*}}(%rip)
	; SSE2-NEXT: movaps %xmm5, c+{{.*}}(%rip)			; SSE2-NEXT: movaps %xmm5, .Lc$local+{{.*}}(%rip)
	; SSE2-NEXT: movdqa c+{{.*}}(%rip), %xmm1			; SSE2-NEXT: movdqa .Lc$local+{{.*}}(%rip), %xmm1
	; SSE2-NEXT: movdqa c+{{.*}}(%rip), %xmm4			; SSE2-NEXT: movdqa .Lc$local+{{.*}}(%rip), %xmm4
	; SSE2-NEXT: movdqa d+{{.*}}(%rip), %xmm5			; SSE2-NEXT: movdqa .Ld$local+{{.*}}(%rip), %xmm5
	; SSE2-NEXT: movdqa d+{{.*}}(%rip), %xmm6			; SSE2-NEXT: movdqa .Ld$local+{{.*}}(%rip), %xmm6
	; SSE2-NEXT: movdqa d+{{.*}}(%rip), %xmm7			; SSE2-NEXT: movdqa .Ld$local+{{.*}}(%rip), %xmm7
	; SSE2-NEXT: movss {{.*#+}} xmm0 = xmm2[0],xmm0[1,2,3]			; SSE2-NEXT: movss {{.*#+}} xmm0 = xmm2[0],xmm0[1,2,3]
	; SSE2-NEXT: psubd %xmm0, %xmm7			; SSE2-NEXT: psubd %xmm0, %xmm7
	; SSE2-NEXT: psubd %xmm4, %xmm6			; SSE2-NEXT: psubd %xmm4, %xmm6
	; SSE2-NEXT: psubd %xmm1, %xmm5			; SSE2-NEXT: psubd %xmm1, %xmm5
	; SSE2-NEXT: movdqa %xmm5, d+{{.*}}(%rip)			; SSE2-NEXT: movdqa %xmm5, .Ld$local+{{.*}}(%rip)
	; SSE2-NEXT: movdqa %xmm6, d+{{.*}}(%rip)			; SSE2-NEXT: movdqa %xmm6, .Ld$local+{{.*}}(%rip)
	; SSE2-NEXT: movdqa %xmm3, d+{{.*}}(%rip)			; SSE2-NEXT: movdqa %xmm3, .Ld$local+{{.*}}(%rip)
	; SSE2-NEXT: movdqa %xmm7, d+{{.*}}(%rip)			; SSE2-NEXT: movdqa %xmm7, .Ld$local+{{.*}}(%rip)
	; SSE2-NEXT: paddd %xmm4, %xmm4			; SSE2-NEXT: paddd %xmm4, %xmm4
	; SSE2-NEXT: paddd %xmm1, %xmm1			; SSE2-NEXT: paddd %xmm1, %xmm1
	; SSE2-NEXT: movdqa %xmm1, c+{{.*}}(%rip)			; SSE2-NEXT: movdqa %xmm1, .Lc$local+{{.*}}(%rip)
	; SSE2-NEXT: movdqa %xmm4, c+{{.*}}(%rip)			; SSE2-NEXT: movdqa %xmm4, .Lc$local+{{.*}}(%rip)
	; SSE2-NEXT: retq			; SSE2-NEXT: retq
	;			;
	; SSE42-LABEL: PR42833:			; SSE42-LABEL: PR42833:
	; SSE42: # %bb.0:			; SSE42: # %bb.0:
	; SSE42-NEXT: movdqa c+{{.*}}(%rip), %xmm1			; SSE42-NEXT: movdqa .Lc$local+{{.*}}(%rip), %xmm1
	; SSE42-NEXT: movdqa c+{{.*}}(%rip), %xmm0			; SSE42-NEXT: movdqa .Lc$local+{{.*}}(%rip), %xmm0
	; SSE42-NEXT: movd %xmm0, %eax			; SSE42-NEXT: movd %xmm0, %eax
	; SSE42-NEXT: addl {{.*}}(%rip), %eax			; SSE42-NEXT: addl {{.*}}(%rip), %eax
	; SSE42-NEXT: movdqa {{.*#+}} xmm2 = <u,1,1,1>			; SSE42-NEXT: movdqa {{.*#+}} xmm2 = <u,1,1,1>
	; SSE42-NEXT: pinsrd $0, %eax, %xmm2			; SSE42-NEXT: pinsrd $0, %eax, %xmm2
	; SSE42-NEXT: movdqa %xmm0, %xmm3			; SSE42-NEXT: movdqa %xmm0, %xmm3
	; SSE42-NEXT: paddd %xmm2, %xmm3			; SSE42-NEXT: paddd %xmm2, %xmm3
	; SSE42-NEXT: pslld $23, %xmm2			; SSE42-NEXT: pslld $23, %xmm2
	; SSE42-NEXT: paddd {{.*}}(%rip), %xmm2			; SSE42-NEXT: paddd {{.*}}(%rip), %xmm2
	; SSE42-NEXT: cvttps2dq %xmm2, %xmm2			; SSE42-NEXT: cvttps2dq %xmm2, %xmm2
	; SSE42-NEXT: pmulld %xmm0, %xmm2			; SSE42-NEXT: pmulld %xmm0, %xmm2
	; SSE42-NEXT: pblendw {{.*#+}} xmm2 = xmm3[0,1],xmm2[2,3,4,5,6,7]			; SSE42-NEXT: pblendw {{.*#+}} xmm2 = xmm3[0,1],xmm2[2,3,4,5,6,7]
	; SSE42-NEXT: movdqa d+{{.*}}(%rip), %xmm3			; SSE42-NEXT: movdqa .Ld$local+{{.*}}(%rip), %xmm3
	; SSE42-NEXT: psubd %xmm1, %xmm3			; SSE42-NEXT: psubd %xmm1, %xmm3
	; SSE42-NEXT: paddd %xmm1, %xmm1			; SSE42-NEXT: paddd %xmm1, %xmm1
	; SSE42-NEXT: movdqa %xmm1, c+{{.*}}(%rip)			; SSE42-NEXT: movdqa %xmm1, .Lc$local+{{.*}}(%rip)
	; SSE42-NEXT: movdqa %xmm2, c+{{.*}}(%rip)			; SSE42-NEXT: movdqa %xmm2, .Lc$local+{{.*}}(%rip)
	; SSE42-NEXT: movdqa c+{{.*}}(%rip), %xmm1			; SSE42-NEXT: movdqa .Lc$local+{{.*}}(%rip), %xmm1
	; SSE42-NEXT: movdqa c+{{.*}}(%rip), %xmm2			; SSE42-NEXT: movdqa .Lc$local+{{.*}}(%rip), %xmm2
	; SSE42-NEXT: movdqa d+{{.*}}(%rip), %xmm4			; SSE42-NEXT: movdqa .Ld$local+{{.*}}(%rip), %xmm4
	; SSE42-NEXT: movdqa d+{{.*}}(%rip), %xmm5			; SSE42-NEXT: movdqa .Ld$local+{{.*}}(%rip), %xmm5
	; SSE42-NEXT: movdqa d+{{.*}}(%rip), %xmm6			; SSE42-NEXT: movdqa .Ld$local+{{.*}}(%rip), %xmm6
	; SSE42-NEXT: pinsrd $0, %eax, %xmm0			; SSE42-NEXT: pinsrd $0, %eax, %xmm0
	; SSE42-NEXT: psubd %xmm0, %xmm6			; SSE42-NEXT: psubd %xmm0, %xmm6
	; SSE42-NEXT: psubd %xmm2, %xmm5			; SSE42-NEXT: psubd %xmm2, %xmm5
	; SSE42-NEXT: psubd %xmm1, %xmm4			; SSE42-NEXT: psubd %xmm1, %xmm4
	; SSE42-NEXT: movdqa %xmm4, d+{{.*}}(%rip)			; SSE42-NEXT: movdqa %xmm4, .Ld$local+{{.*}}(%rip)
	; SSE42-NEXT: movdqa %xmm5, d+{{.*}}(%rip)			; SSE42-NEXT: movdqa %xmm5, .Ld$local+{{.*}}(%rip)
	; SSE42-NEXT: movdqa %xmm3, d+{{.*}}(%rip)			; SSE42-NEXT: movdqa %xmm3, .Ld$local+{{.*}}(%rip)
	; SSE42-NEXT: movdqa %xmm6, d+{{.*}}(%rip)			; SSE42-NEXT: movdqa %xmm6, .Ld$local+{{.*}}(%rip)
	; SSE42-NEXT: paddd %xmm2, %xmm2			; SSE42-NEXT: paddd %xmm2, %xmm2
	; SSE42-NEXT: paddd %xmm1, %xmm1			; SSE42-NEXT: paddd %xmm1, %xmm1
	; SSE42-NEXT: movdqa %xmm1, c+{{.*}}(%rip)			; SSE42-NEXT: movdqa %xmm1, .Lc$local+{{.*}}(%rip)
	; SSE42-NEXT: movdqa %xmm2, c+{{.*}}(%rip)			; SSE42-NEXT: movdqa %xmm2, .Lc$local+{{.*}}(%rip)
	; SSE42-NEXT: retq			; SSE42-NEXT: retq
	;			;
	; AVX1-LABEL: PR42833:			; AVX1-LABEL: PR42833:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vmovdqa c+{{.*}}(%rip), %xmm0			; AVX1-NEXT: vmovdqa .Lc$local+{{.*}}(%rip), %xmm0
	; AVX1-NEXT: vmovd %xmm0, %eax			; AVX1-NEXT: vmovd %xmm0, %eax
	; AVX1-NEXT: addl {{.*}}(%rip), %eax			; AVX1-NEXT: addl {{.*}}(%rip), %eax
	; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = <u,1,1,1>			; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = <u,1,1,1>
	; AVX1-NEXT: vpinsrd $0, %eax, %xmm1, %xmm1			; AVX1-NEXT: vpinsrd $0, %eax, %xmm1, %xmm1
	; AVX1-NEXT: vpaddd %xmm1, %xmm0, %xmm2			; AVX1-NEXT: vpaddd %xmm1, %xmm0, %xmm2
	; AVX1-NEXT: vmovdqa c+{{.*}}(%rip), %xmm3			; AVX1-NEXT: vmovdqa .Lc$local+{{.*}}(%rip), %xmm3
	; AVX1-NEXT: vpslld $23, %xmm1, %xmm1			; AVX1-NEXT: vpslld $23, %xmm1, %xmm1
	; AVX1-NEXT: vpaddd {{.*}}(%rip), %xmm1, %xmm1			; AVX1-NEXT: vpaddd {{.*}}(%rip), %xmm1, %xmm1
	; AVX1-NEXT: vcvttps2dq %xmm1, %xmm1			; AVX1-NEXT: vcvttps2dq %xmm1, %xmm1
	; AVX1-NEXT: vpmulld %xmm1, %xmm0, %xmm1			; AVX1-NEXT: vpmulld %xmm1, %xmm0, %xmm1
	; AVX1-NEXT: vpslld $1, %xmm3, %xmm3			; AVX1-NEXT: vpslld $1, %xmm3, %xmm3
	; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm1, %ymm1			; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm1, %ymm1
	; AVX1-NEXT: vblendps {{.*#+}} ymm1 = ymm2[0],ymm1[1,2,3,4,5,6,7]			; AVX1-NEXT: vblendps {{.*#+}} ymm1 = ymm2[0],ymm1[1,2,3,4,5,6,7]
	; AVX1-NEXT: vmovdqa d+{{.*}}(%rip), %xmm2			; AVX1-NEXT: vmovdqa .Ld$local+{{.*}}(%rip), %xmm2
	; AVX1-NEXT: vpsubd c+{{.*}}(%rip), %xmm2, %xmm2			; AVX1-NEXT: vpsubd .Lc$local+{{.*}}(%rip), %xmm2, %xmm2
	; AVX1-NEXT: vmovups %ymm1, c+{{.*}}(%rip)			; AVX1-NEXT: vmovups %ymm1, .Lc$local+{{.*}}(%rip)
	; AVX1-NEXT: vpinsrd $0, %eax, %xmm0, %xmm0			; AVX1-NEXT: vpinsrd $0, %eax, %xmm0, %xmm0
	; AVX1-NEXT: vmovdqa d+{{.*}}(%rip), %xmm1			; AVX1-NEXT: vmovdqa .Ld$local+{{.*}}(%rip), %xmm1
	; AVX1-NEXT: vpsubd %xmm0, %xmm1, %xmm0			; AVX1-NEXT: vpsubd %xmm0, %xmm1, %xmm0
	; AVX1-NEXT: vmovdqa d+{{.*}}(%rip), %xmm1			; AVX1-NEXT: vmovdqa .Ld$local+{{.*}}(%rip), %xmm1
	; AVX1-NEXT: vmovdqa c+{{.*}}(%rip), %xmm3			; AVX1-NEXT: vmovdqa .Lc$local+{{.*}}(%rip), %xmm3
	; AVX1-NEXT: vpsubd %xmm3, %xmm1, %xmm1			; AVX1-NEXT: vpsubd %xmm3, %xmm1, %xmm1
	; AVX1-NEXT: vmovdqa d+{{.*}}(%rip), %xmm4			; AVX1-NEXT: vmovdqa .Ld$local+{{.*}}(%rip), %xmm4
	; AVX1-NEXT: vmovdqa c+{{.*}}(%rip), %xmm5			; AVX1-NEXT: vmovdqa .Lc$local+{{.*}}(%rip), %xmm5
	; AVX1-NEXT: vpsubd %xmm5, %xmm4, %xmm4			; AVX1-NEXT: vpsubd %xmm5, %xmm4, %xmm4
	; AVX1-NEXT: vmovdqa %xmm2, d+{{.*}}(%rip)			; AVX1-NEXT: vmovdqa %xmm2, .Ld$local+{{.*}}(%rip)
	; AVX1-NEXT: vmovdqa %xmm4, d+{{.*}}(%rip)			; AVX1-NEXT: vmovdqa %xmm4, .Ld$local+{{.*}}(%rip)
	; AVX1-NEXT: vmovdqa %xmm1, d+{{.*}}(%rip)			; AVX1-NEXT: vmovdqa %xmm1, .Ld$local+{{.*}}(%rip)
	; AVX1-NEXT: vmovdqa %xmm0, d+{{.*}}(%rip)			; AVX1-NEXT: vmovdqa %xmm0, .Ld$local+{{.*}}(%rip)
	; AVX1-NEXT: vpaddd %xmm3, %xmm3, %xmm0			; AVX1-NEXT: vpaddd %xmm3, %xmm3, %xmm0
	; AVX1-NEXT: vpaddd %xmm5, %xmm5, %xmm1			; AVX1-NEXT: vpaddd %xmm5, %xmm5, %xmm1
	; AVX1-NEXT: vmovdqa %xmm1, c+{{.*}}(%rip)			; AVX1-NEXT: vmovdqa %xmm1, .Lc$local+{{.*}}(%rip)
	; AVX1-NEXT: vmovdqa %xmm0, c+{{.*}}(%rip)			; AVX1-NEXT: vmovdqa %xmm0, .Lc$local+{{.*}}(%rip)
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-LABEL: PR42833:			; AVX2-LABEL: PR42833:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: movl {{.*}}(%rip), %eax			; AVX2-NEXT: movl {{.*}}(%rip), %eax
	; AVX2-NEXT: vmovdqu c+{{.*}}(%rip), %ymm0			; AVX2-NEXT: vmovdqu .Lc$local+{{.*}}(%rip), %ymm0
	; AVX2-NEXT: addl c+{{.*}}(%rip), %eax			; AVX2-NEXT: addl .Lc$local+{{.*}}(%rip), %eax
	; AVX2-NEXT: vmovd %eax, %xmm1			; AVX2-NEXT: vmovd %eax, %xmm1
	; AVX2-NEXT: vpblendd {{.*#+}} ymm2 = ymm1[0],mem[1,2,3,4,5,6,7]			; AVX2-NEXT: vpblendd {{.*#+}} ymm2 = ymm1[0],mem[1,2,3,4,5,6,7]
	; AVX2-NEXT: vpaddd %ymm2, %ymm0, %ymm3			; AVX2-NEXT: vpaddd %ymm2, %ymm0, %ymm3
	; AVX2-NEXT: vpsllvd %ymm2, %ymm0, %ymm2			; AVX2-NEXT: vpsllvd %ymm2, %ymm0, %ymm2
	; AVX2-NEXT: vpblendd {{.*#+}} ymm2 = ymm3[0],ymm2[1,2,3,4,5,6,7]			; AVX2-NEXT: vpblendd {{.*#+}} ymm2 = ymm3[0],ymm2[1,2,3,4,5,6,7]
	; AVX2-NEXT: vmovdqu %ymm2, c+{{.*}}(%rip)			; AVX2-NEXT: vmovdqu %ymm2, .Lc$local+{{.*}}(%rip)
	; AVX2-NEXT: vmovdqu c+{{.*}}(%rip), %ymm2			; AVX2-NEXT: vmovdqu .Lc$local+{{.*}}(%rip), %ymm2
	; AVX2-NEXT: vmovdqu d+{{.*}}(%rip), %ymm3			; AVX2-NEXT: vmovdqu .Ld$local+{{.*}}(%rip), %ymm3
	; AVX2-NEXT: vmovdqu d+{{.*}}(%rip), %ymm4			; AVX2-NEXT: vmovdqu .Ld$local+{{.*}}(%rip), %ymm4
	; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm1[0],ymm0[1,2,3,4,5,6,7]			; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm1[0],ymm0[1,2,3,4,5,6,7]
	; AVX2-NEXT: vpsubd %ymm0, %ymm4, %ymm0			; AVX2-NEXT: vpsubd %ymm0, %ymm4, %ymm0
	; AVX2-NEXT: vpsubd %ymm2, %ymm3, %ymm1			; AVX2-NEXT: vpsubd %ymm2, %ymm3, %ymm1
	; AVX2-NEXT: vmovdqu %ymm1, d+{{.*}}(%rip)			; AVX2-NEXT: vmovdqu %ymm1, .Ld$local+{{.*}}(%rip)
	; AVX2-NEXT: vmovdqu %ymm0, d+{{.*}}(%rip)			; AVX2-NEXT: vmovdqu %ymm0, .Ld$local+{{.*}}(%rip)
	; AVX2-NEXT: vpaddd %ymm2, %ymm2, %ymm0			; AVX2-NEXT: vpaddd %ymm2, %ymm2, %ymm0
	; AVX2-NEXT: vmovdqu %ymm0, c+{{.*}}(%rip)			; AVX2-NEXT: vmovdqu %ymm0, .Lc$local+{{.*}}(%rip)
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: PR42833:			; AVX512-LABEL: PR42833:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: movl {{.*}}(%rip), %eax			; AVX512-NEXT: movl {{.*}}(%rip), %eax
	; AVX512-NEXT: vmovdqu c+{{.*}}(%rip), %ymm0			; AVX512-NEXT: vmovdqu .Lc$local+{{.*}}(%rip), %ymm0
	; AVX512-NEXT: vmovdqu64 c+{{.*}}(%rip), %zmm1			; AVX512-NEXT: vmovdqu64 .Lc$local+{{.*}}(%rip), %zmm1
	; AVX512-NEXT: addl c+{{.*}}(%rip), %eax			; AVX512-NEXT: addl .Lc$local+{{.*}}(%rip), %eax
	; AVX512-NEXT: vmovd %eax, %xmm2			; AVX512-NEXT: vmovd %eax, %xmm2
	; AVX512-NEXT: vpblendd {{.*#+}} ymm2 = ymm2[0],mem[1,2,3,4,5,6,7]			; AVX512-NEXT: vpblendd {{.*#+}} ymm2 = ymm2[0],mem[1,2,3,4,5,6,7]
	; AVX512-NEXT: vpaddd %ymm2, %ymm0, %ymm3			; AVX512-NEXT: vpaddd %ymm2, %ymm0, %ymm3
	; AVX512-NEXT: vpsllvd %ymm2, %ymm0, %ymm0			; AVX512-NEXT: vpsllvd %ymm2, %ymm0, %ymm0
	; AVX512-NEXT: vpblendd {{.*#+}} ymm0 = ymm3[0],ymm0[1,2,3,4,5,6,7]			; AVX512-NEXT: vpblendd {{.*#+}} ymm0 = ymm3[0],ymm0[1,2,3,4,5,6,7]
	; AVX512-NEXT: vmovdqa c+{{.*}}(%rip), %xmm2			; AVX512-NEXT: vmovdqa .Lc$local+{{.*}}(%rip), %xmm2
	; AVX512-NEXT: vmovdqu %ymm0, c+{{.*}}(%rip)			; AVX512-NEXT: vmovdqu %ymm0, .Lc$local+{{.*}}(%rip)
	; AVX512-NEXT: vmovdqu c+{{.*}}(%rip), %ymm0			; AVX512-NEXT: vmovdqu .Lc$local+{{.*}}(%rip), %ymm0
	; AVX512-NEXT: vmovdqu64 d+{{.*}}(%rip), %zmm3			; AVX512-NEXT: vmovdqu64 .Ld$local+{{.*}}(%rip), %zmm3
	; AVX512-NEXT: vpinsrd $0, %eax, %xmm2, %xmm2			; AVX512-NEXT: vpinsrd $0, %eax, %xmm2, %xmm2
	; AVX512-NEXT: vinserti32x4 $0, %xmm2, %zmm1, %zmm1			; AVX512-NEXT: vinserti32x4 $0, %xmm2, %zmm1, %zmm1
	; AVX512-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm1			; AVX512-NEXT: vinserti64x4 $1, %ymm0, %zmm1, %zmm1
	; AVX512-NEXT: vpsubd %zmm1, %zmm3, %zmm1			; AVX512-NEXT: vpsubd %zmm1, %zmm3, %zmm1
	; AVX512-NEXT: vmovdqu64 %zmm1, d+{{.*}}(%rip)			; AVX512-NEXT: vmovdqu64 %zmm1, .Ld$local+{{.*}}(%rip)
	; AVX512-NEXT: vpaddd %ymm0, %ymm0, %ymm0			; AVX512-NEXT: vpaddd %ymm0, %ymm0, %ymm0
	; AVX512-NEXT: vmovdqu %ymm0, c+{{.*}}(%rip)			; AVX512-NEXT: vmovdqu %ymm0, .Lc$local+{{.*}}(%rip)
	; AVX512-NEXT: vzeroupper			; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; XOP-LABEL: PR42833:			; XOP-LABEL: PR42833:
	; XOP: # %bb.0:			; XOP: # %bb.0:
	; XOP-NEXT: vmovdqa c+{{.*}}(%rip), %xmm0			; XOP-NEXT: vmovdqa .Lc$local+{{.*}}(%rip), %xmm0
	; XOP-NEXT: vmovd %xmm0, %eax			; XOP-NEXT: vmovd %xmm0, %eax
	; XOP-NEXT: addl {{.*}}(%rip), %eax			; XOP-NEXT: addl {{.*}}(%rip), %eax
	; XOP-NEXT: vmovdqa {{.*#+}} xmm1 = <u,1,1,1>			; XOP-NEXT: vmovdqa {{.*#+}} xmm1 = <u,1,1,1>
	; XOP-NEXT: vpinsrd $0, %eax, %xmm1, %xmm1			; XOP-NEXT: vpinsrd $0, %eax, %xmm1, %xmm1
	; XOP-NEXT: vpaddd %xmm1, %xmm0, %xmm2			; XOP-NEXT: vpaddd %xmm1, %xmm0, %xmm2
	; XOP-NEXT: vmovdqa c+{{.*}}(%rip), %xmm3			; XOP-NEXT: vmovdqa .Lc$local+{{.*}}(%rip), %xmm3
	; XOP-NEXT: vpshld %xmm1, %xmm0, %xmm1			; XOP-NEXT: vpshld %xmm1, %xmm0, %xmm1
	; XOP-NEXT: vpslld $1, %xmm3, %xmm3			; XOP-NEXT: vpslld $1, %xmm3, %xmm3
	; XOP-NEXT: vinsertf128 $1, %xmm3, %ymm1, %ymm1			; XOP-NEXT: vinsertf128 $1, %xmm3, %ymm1, %ymm1
	; XOP-NEXT: vblendps {{.*#+}} ymm1 = ymm2[0],ymm1[1,2,3,4,5,6,7]			; XOP-NEXT: vblendps {{.*#+}} ymm1 = ymm2[0],ymm1[1,2,3,4,5,6,7]
	; XOP-NEXT: vmovdqa d+{{.*}}(%rip), %xmm2			; XOP-NEXT: vmovdqa .Ld$local+{{.*}}(%rip), %xmm2
	; XOP-NEXT: vpsubd c+{{.*}}(%rip), %xmm2, %xmm2			; XOP-NEXT: vpsubd .Lc$local+{{.*}}(%rip), %xmm2, %xmm2
	; XOP-NEXT: vmovups %ymm1, c+{{.*}}(%rip)			; XOP-NEXT: vmovups %ymm1, .Lc$local+{{.*}}(%rip)
	; XOP-NEXT: vpinsrd $0, %eax, %xmm0, %xmm0			; XOP-NEXT: vpinsrd $0, %eax, %xmm0, %xmm0
	; XOP-NEXT: vmovdqa d+{{.*}}(%rip), %xmm1			; XOP-NEXT: vmovdqa .Ld$local+{{.*}}(%rip), %xmm1
	; XOP-NEXT: vpsubd %xmm0, %xmm1, %xmm0			; XOP-NEXT: vpsubd %xmm0, %xmm1, %xmm0
	; XOP-NEXT: vmovdqa d+{{.*}}(%rip), %xmm1			; XOP-NEXT: vmovdqa .Ld$local+{{.*}}(%rip), %xmm1
	; XOP-NEXT: vmovdqa c+{{.*}}(%rip), %xmm3			; XOP-NEXT: vmovdqa .Lc$local+{{.*}}(%rip), %xmm3
	; XOP-NEXT: vpsubd %xmm3, %xmm1, %xmm1			; XOP-NEXT: vpsubd %xmm3, %xmm1, %xmm1
	; XOP-NEXT: vmovdqa d+{{.*}}(%rip), %xmm4			; XOP-NEXT: vmovdqa .Ld$local+{{.*}}(%rip), %xmm4
	; XOP-NEXT: vmovdqa c+{{.*}}(%rip), %xmm5			; XOP-NEXT: vmovdqa .Lc$local+{{.*}}(%rip), %xmm5
	; XOP-NEXT: vpsubd %xmm5, %xmm4, %xmm4			; XOP-NEXT: vpsubd %xmm5, %xmm4, %xmm4
	; XOP-NEXT: vmovdqa %xmm2, d+{{.*}}(%rip)			; XOP-NEXT: vmovdqa %xmm2, .Ld$local+{{.*}}(%rip)
	; XOP-NEXT: vmovdqa %xmm4, d+{{.*}}(%rip)			; XOP-NEXT: vmovdqa %xmm4, .Ld$local+{{.*}}(%rip)
	; XOP-NEXT: vmovdqa %xmm1, d+{{.*}}(%rip)			; XOP-NEXT: vmovdqa %xmm1, .Ld$local+{{.*}}(%rip)
	; XOP-NEXT: vmovdqa %xmm0, d+{{.*}}(%rip)			; XOP-NEXT: vmovdqa %xmm0, .Ld$local+{{.*}}(%rip)
	; XOP-NEXT: vpaddd %xmm3, %xmm3, %xmm0			; XOP-NEXT: vpaddd %xmm3, %xmm3, %xmm0
	; XOP-NEXT: vpaddd %xmm5, %xmm5, %xmm1			; XOP-NEXT: vpaddd %xmm5, %xmm5, %xmm1
	; XOP-NEXT: vmovdqa %xmm1, c+{{.*}}(%rip)			; XOP-NEXT: vmovdqa %xmm1, .Lc$local+{{.*}}(%rip)
	; XOP-NEXT: vmovdqa %xmm0, c+{{.*}}(%rip)			; XOP-NEXT: vmovdqa %xmm0, .Lc$local+{{.*}}(%rip)
	; XOP-NEXT: vzeroupper			; XOP-NEXT: vzeroupper
	; XOP-NEXT: retq			; XOP-NEXT: retq
	%1 = load i32, i32* @b, align 4			%1 = load i32, i32* @b, align 4
	%2 = load <8 x i32>, <8 x i32>* bitcast (i32* getelementptr inbounds ([49 x i32], [49 x i32]* @c, i64 0, i64 32) to <8 x i32>*), align 16			%2 = load <8 x i32>, <8 x i32>* bitcast (i32* getelementptr inbounds ([49 x i32], [49 x i32]* @c, i64 0, i64 32) to <8 x i32>*), align 16
	%3 = shufflevector <8 x i32> %2, <8 x i32> undef, <16 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%3 = shufflevector <8 x i32> %2, <8 x i32> undef, <16 x i32> <i32 undef, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%4 = extractelement <8 x i32> %2, i32 0			%4 = extractelement <8 x i32> %2, i32 0
	%5 = add i32 %1, %4			%5 = add i32 %1, %4
	%6 = insertelement <8 x i32> <i32 undef, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, i32 %5, i32 0			%6 = insertelement <8 x i32> <i32 undef, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>, i32 %5, i32 0
	Show All 15 Lines

llvm/test/CodeGen/X86/pr38795.ll

	Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: jne .LBB0_12			; CHECK-NEXT: jne .LBB0_12
	; CHECK-NEXT: .LBB0_17: # %if.end39			; CHECK-NEXT: .LBB0_17: # %if.end39
	; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: testl %eax, %eax			; CHECK-NEXT: testl %eax, %eax
	; CHECK-NEXT: je .LBB0_19			; CHECK-NEXT: je .LBB0_19
	; CHECK-NEXT: # %bb.18: # %if.then41			; CHECK-NEXT: # %bb.18: # %if.then41
	; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: movl $0, {{[0-9]+}}(%esp)			; CHECK-NEXT: movl $0, {{[0-9]+}}(%esp)
	; CHECK-NEXT: movl $fn, {{[0-9]+}}(%esp)			; CHECK-NEXT: movl $.Lfn$local, {{[0-9]+}}(%esp)
	; CHECK-NEXT: movl $.str, (%esp)			; CHECK-NEXT: movl $.str, (%esp)
	; CHECK-NEXT: calll printf			; CHECK-NEXT: calll printf
	; CHECK-NEXT: .LBB0_19: # %for.end46			; CHECK-NEXT: .LBB0_19: # %for.end46
	; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: # in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: # implicit-def: $dl			; CHECK-NEXT: # implicit-def: $dl
	; CHECK-NEXT: # implicit-def: $dh			; CHECK-NEXT: # implicit-def: $dh
	; CHECK-NEXT: # implicit-def: $ebp			; CHECK-NEXT: # implicit-def: $ebp
	; CHECK-NEXT: jmp .LBB0_20			; CHECK-NEXT: jmp .LBB0_20
	▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/tailcallpic1.ll

	; RUN: llc < %s -tailcallopt -mtriple=i686-pc-linux-gnu -relocation-model=pic \| FileCheck %s			; RUN: llc < %s -tailcallopt -mtriple=i686-pc-linux-gnu -relocation-model=pic \| FileCheck %s

	; This test uses guaranteed TCO so these will be tail calls, despite the early			; This test uses guaranteed TCO so these will be tail calls, despite the early
	; binding issues.			; binding issues.

	define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 %a4) {			define protected fastcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 %a4) {
	entry:			entry:
	ret i32 %a3			ret i32 %a3
	}			}

	define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {			define fastcc i32 @tailcaller(i32 %in1, i32 %in2) {
	entry:			entry:
	%tmp11 = tail call fastcc i32 @tailcallee( i32 %in1, i32 %in2, i32 %in1, i32 %in2 ) ; <i32> [#uses=1]			%tmp11 = tail call fastcc i32 @tailcallee( i32 %in1, i32 %in2, i32 %in1, i32 %in2 ) ; <i32> [#uses=1]
	ret i32 %tmp11			ret i32 %tmp11
	; CHECK: jmp tailcallee			; CHECK: jmp .Ltailcallee$local
	}			}

llvm/test/CodeGen/X86/tailcallpic3.ll

	Show All 10 Lines
	}			}

	define void @tailcall_hidden() {			define void @tailcall_hidden() {
	entry:			entry:
	tail call void @tailcallee_hidden()			tail call void @tailcallee_hidden()
	ret void			ret void
	}			}
	; CHECK: tailcall_hidden:			; CHECK: tailcall_hidden:
	; CHECK: jmp tailcallee_hidden			; CHECK: jmp .Ltailcallee_hidden$local

	define internal void @tailcallee_internal() {			define internal void @tailcallee_internal() {
	entry:			entry:
	ret void			ret void
	}			}

	define void @tailcall_internal() {			define void @tailcall_internal() {
	entry:			entry:
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/tailccpic1.ll

	; RUN: llc < %s -mtriple=i686-pc-linux-gnu -relocation-model=pic \| FileCheck %s			; RUN: llc < %s -mtriple=i686-pc-linux-gnu -relocation-model=pic \| FileCheck %s

	; This test uses guaranteed TCO so these will be tail calls, despite the early			; This test uses guaranteed TCO so these will be tail calls, despite the early
	; binding issues.			; binding issues.

	define protected tailcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 %a4) {			define protected tailcc i32 @tailcallee(i32 %a1, i32 %a2, i32 %a3, i32 %a4) {
	entry:			entry:
	ret i32 %a3			ret i32 %a3
	}			}

	define tailcc i32 @tailcaller(i32 %in1, i32 %in2) {			define tailcc i32 @tailcaller(i32 %in1, i32 %in2) {
	entry:			entry:
	%tmp11 = tail call tailcc i32 @tailcallee( i32 %in1, i32 %in2, i32 %in1, i32 %in2 ) ; <i32> [#uses=1]			%tmp11 = tail call tailcc i32 @tailcallee( i32 %in1, i32 %in2, i32 %in1, i32 %in2 ) ; <i32> [#uses=1]
	ret i32 %tmp11			ret i32 %tmp11
	; CHECK: jmp tailcallee			; CHECK: jmp .Ltailcallee$local
	}			}

llvm/test/CodeGen/X86/tls.ll

	Show First 20 Lines • Show All 204 Lines • ▼ Show 20 Lines
	; MINGW32-NEXT: retl			; MINGW32-NEXT: retl

	entry:			entry:
	ret i32* @i3			ret i32* @i3
	}			}

	define i32 @f7() {			define i32 @f7() {
	; X86_LINUX-LABEL: f7:			; X86_LINUX-LABEL: f7:
	; X86_LINUX: movl %gs:i4@NTPOFF, %eax			; X86_LINUX: movl %gs:.Li4$local@NTPOFF, %eax
	; X86_LINUX-NEXT: ret			; X86_LINUX-NEXT: ret
	; X64_LINUX-LABEL: f7:			; X64_LINUX-LABEL: f7:
	; X64_LINUX: movl %fs:i4@TPOFF, %eax			; X64_LINUX: movl %fs:.Li4$local@TPOFF, %eax
	; X64_LINUX-NEXT: ret			; X64_LINUX-NEXT: ret
	; MINGW32-LABEL: _f7:			; MINGW32-LABEL: _f7:
	; MINGW32: movl __tls_index, %eax			; MINGW32: movl __tls_index, %eax
	; MINGW32-NEXT: movl %fs:44, %ecx			; MINGW32-NEXT: movl %fs:44, %ecx
	; MINGW32-NEXT: movl (%ecx,%eax,4), %eax			; MINGW32-NEXT: movl (%ecx,%eax,4), %eax
	; MINGW32-NEXT: movl _i4@SECREL32(%eax), %eax			; MINGW32-NEXT: movl _i4@SECREL32(%eax), %eax
	; MINGW32-NEXT: retl			; MINGW32-NEXT: retl

	entry:			entry:
	%tmp1 = load i32, i32* @i4			%tmp1 = load i32, i32* @i4
	ret i32 %tmp1			ret i32 %tmp1
	}			}

	define i32* @f8() {			define i32* @f8() {
	; X86_LINUX-LABEL: f8:			; X86_LINUX-LABEL: f8:
	; X86_LINUX: movl %gs:0, %eax			; X86_LINUX: movl %gs:0, %eax
	; X86_LINUX-NEXT: leal i4@NTPOFF(%eax), %eax			; X86_LINUX-NEXT: leal .Li4$local@NTPOFF(%eax), %eax
	; X86_LINUX-NEXT: ret			; X86_LINUX-NEXT: ret
	; X64_LINUX-LABEL: f8:			; X64_LINUX-LABEL: f8:
	; X64_LINUX: movq %fs:0, %rax			; X64_LINUX: movq %fs:0, %rax
	; X64_LINUX-NEXT: leaq i4@TPOFF(%rax), %rax			; X64_LINUX-NEXT: leaq .Li4$local@TPOFF(%rax), %rax
	; X64_LINUX-NEXT: ret			; X64_LINUX-NEXT: ret
	; MINGW32-LABEL: _f8:			; MINGW32-LABEL: _f8:
	; MINGW32: movl __tls_index, %eax			; MINGW32: movl __tls_index, %eax
	; MINGW32-NEXT: movl %fs:44, %ecx			; MINGW32-NEXT: movl %fs:44, %ecx
	; MINGW32-NEXT: movl (%ecx,%eax,4), %eax			; MINGW32-NEXT: movl (%ecx,%eax,4), %eax
	; MINGW32-NEXT: leal _i4@SECREL32(%eax), %eax			; MINGW32-NEXT: leal _i4@SECREL32(%eax), %eax
	; MINGW32-NEXT: retl			; MINGW32-NEXT: retl

	▲ Show 20 Lines • Show All 268 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$localClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 241621

llvm/lib/Target/X86/X86MCInstLower.cpp

llvm/test/CodeGen/X86/code-model-elf.ll

llvm/test/CodeGen/X86/emutls.ll

llvm/test/CodeGen/X86/fold-add-pcrel.ll

llvm/test/CodeGen/X86/linux-preemption.ll

llvm/test/CodeGen/X86/oddsubvector.ll

llvm/test/CodeGen/X86/pr38795.ll

llvm/test/CodeGen/X86/tailcallpic1.ll

llvm/test/CodeGen/X86/tailcallpic3.ll

llvm/test/CodeGen/X86/tailccpic1.ll

llvm/test/CodeGen/X86/tls.ll

[X86][ELF] Prefer to lower MC_GlobalAddress operands to .Lfoo$local
ClosedPublic