This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/X86/
-
Target/
-
X86/
-
CMakeLists.txt
-
X86.h
-
X86FixupZExt.cpp
-
X86TargetMachine.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
avx-intrinsics-x86.ll
-
avx512-cmp.ll
-
cmpxchg-i1.ll
-
cmpxchg-i128-i1.ll
-
fast-isel-cmp.ll
-
fp128-cast.ll
-
fp128-compare.ll
-
sse-intrinsics-fast-isel.ll
-
sse-intrinsics-x86.ll

Differential D23253

[X86] Generalized transformation of `definstr gr8; movzx gr32, gr8` to `xor gr32, gr32; definstr gr8`
Needs ReviewPublic

Authored by bryant on Aug 7 2016, 9:02 PM.

Download Raw Diff

Details

Reviewers

mkuper
llvm-commits

Summary

As indicated by the title, this post-register allocation pre-rewrite pass
generalizes D21774 by matching patterns of the form,

gr8<def> = instr_defining_gr8  # that may or may not use eflags
gr32<def> = movzx gr8

into,

gr32 = mov32r0 eflags<imp-def>  # carefully avoids clobbering eflags
...
gr8<def> = instr_defining_gr8

with the goal of reducing read stalls, partial register stalls, micro-ops, and
overall binary size.

Except for a few rare cases, it never performs worse than D21774, and does
surprisingly better in other cases. IACA-annotated assembly output can be found
at https://reviews.llvm.org/P7213 (for x86-64) and at
https://reviews.llvm.org/P7214 (for x86-32).

Not all of the tests have been updated; this is still a work in progress.

EDIT: grammar.

Diff Detail

Repository: rL LLVM

Event Timeline

bryant updated this revision to Diff 67116.Aug 7 2016, 9:02 PM

bryant retitled this revision from to [X86] Generalized transformation of `definstr gr8; movzx gr32, gr8` to `xor gr32, gr32; definstr gr8`.

bryant updated this object.

bryant added reviewers: llvm-commits, mkuper.

bryant set the repository for this revision to rL LLVM.

bryant updated this object.

I should add that the benchmark data in the two aforementioned links were generated by running llc -march={x86,x86-64} -mattr=+sse,+sse4.2 -O3 over the entire test set under test/CodeGen/X86. An appreciable number of cases do not compile under these flags (or simply produce assembly output that gas cannot compile) and have been omitted by my annotator.

I would also like to note that because this pass works by register
re-allocation, it _never_ "pessimizes" the way D21774 does. For instance, under
-march=x86 (diff of assembler output of
test/CodeGen/X86/sse42-intrinsics-x86.ll, with "-" indicating D21774 and "+"
indicating this pass):

 test_x86_sse42_pcmpestria128:
-# Throughput: 2.25; Uops: 8; Latency: 7; Size: 36
+# Throughput: 1.55; Uops: 6; Latency: 3; Size: 33
-    pushl    %ebx
     movl    $7, %eax
     movl    $7, %edx
-    xorl    %ebx, %ebx
     pcmpestri    $7, %xmm1, %xmm0
-    seta    %bl
-    movl    %ebx, %eax
-    popl    %ebx
+    seta    %al
+    movzbl    %al, %eax
 retl

Or perhaps less obviously (from cmpxchg-i1.ll under x86-64):

 cmpxchg_zext:
-# Throughput: 2.05; Uops: 6; Latency: 10; Size: 24
+# Throughput: 1.85; Uops: 7; Latency: 10; Size: 23
-    xorl    %ecx, %ecx
     movl    %esi, %eax
     lock        cmpxchgl    %edx, (%rdi)
-    sete    %cl
-    movl    %ecx, %eax
+    sete    %al
+    movzbl    %al, %eax
 retq

On the other hand, because it matches on any gr8-defining instruction (not just
setccs), it can do things like (from pre-ra-sched.ll under x86-64):

 test:
-# Throughput: 11.05; Uops: 27; Latency: 14; Size: 77
+# Throughput: 6.55; Uops: 24; Latency: 14; Size: 77
     movzbl    1(%rdi), %r9d
-    movb    2(%rdi), %al
-    xorb    %r9b, %al
+    xorl    %r10d, %r10d
+    movb    2(%rdi), %r10b
+    xorb    %r9b, %r10b
     movzbl    3(%rdi), %esi
-    movb    4(%rdi), %cl
-    xorb    %sil, %cl
+    xorl    %eax, %eax
+    movb    4(%rdi), %al
+    xorb    %sil, %al
     movzbl    5(%rdi), %r8d
-    movb    6(%rdi), %dl
-    xorb    %r8b, %dl
+    xorl    %ecx, %ecx
+    movb    6(%rdi), %cl
+    xorb    %r8b, %cl
     cmpb    $0, (%rdi)
-    movzbl    %al, %edi
-    cmovnel    %r9d, %edi
-    movzbl    %cl, %eax
+    cmovnel    %r9d, %r10d
     cmovnel    %esi, %eax
-    movzbl    %dl, %ecx
     cmovnel    %r8d, %ecx
     testl    %r9d, %r9d
-    cmovnel    %edi, %eax
+    cmovnel    %r10d, %eax
     testl    %esi, %esi
     cmovel    %ecx, %eax
     retq

which, if IACA is to be believed, is 40% ++throughput for free.

Also, it should be noted that this differential depends on the zip iterator
introduced in D23252 .

Hi bryant,

Thanks a lot for working on this! The improvements looks really nice, and it's great that this fixes the pessimizations from rL274692.

I haven't really looked at the patch yet, only skimmed it briefly - so I still don't have even the slightest opinion on the logic.
So, for now, some very general comments/questions:

Have you considered adding logic to ExecutionDepsFix or X86FixupBWInsts, as opposed to a new pass? I haven't thought about this too much myself, but those are post-RA passes with somewhat similar goals, and it may make sense to share some of the code.
How generic is this? E.g. does this handle PR28442?
Do you know why this is failing on cases that we catch with rL274692? If the improvement is large enough we don't have to be strictly better than existing code (rL274692 itself wasn't :) ), but it'd be good to understand what's going on so we can at least guess whether it can happen in "interesting" non-toy cases.
This is a fairly large patch. It's probably fairly self-contained, and I don't immediately see a way to break into meaningful functional pieces. However, that makes it a priori harder to review. So it'd be really nice to make reviewers' life easier by providing copious documentation. :)
There's a decent amount of generic code that doesn't look like it should live in the pass. If it's generally useful, then it should live in one of the utility headers. If not, and it's only ever instantiated for a single type, then we can get rid of the template extra complexity. (Also, beware of the lack of compiler support. We need to build LLVM with some fairly old or odd compilers - MSVC 2013, first of foremost. If you haven't seen a pattern used in LLVM, it may be because we still want to be built with compilers that don't support it)
A couple of pervasive differences from the LLVM coding conventions I've noticed from skimming:
- We generally use camelcase for identifiers, with variables starting with an uppercase letter, and functions with lowercase. Common acronyms (like TRI) are all-caps. There are some exceptions ("standard" constructs like "int i", lower cases with underscores where we want to fit STL style, changes to old code that uses a different convention), but those are fairly rare.
- We don't usually explicitly compare to nullptr.

mkuper added subscribers: RKSimon, spatel, DavidKreitzer.Aug 8 2016, 12:37 AM

Fixed compile error caused by lack of "const" in return type.

Rely on RAII to clean up unused MOV32r0
Outsource register allocation hints to VirtRegAuxInfo::copyHint
Add regmask constraints if no hints are found

Herald added a subscriber: qcolombet. · View Herald TranscriptAug 14 2016, 11:23 AM

Conform to camel-case naming style for functions.

Conform to variable name casing convention.

Convert explicit checks for nullptr to operator bool.

In D23253#508298, @mkuper wrote:

Hi bryant,

Thanks a lot for working on this! The improvements looks really nice, and it's
great that this fixes the pessimizations from rL274692.

I haven't really looked at the patch yet, only skimmed it briefly - so I still
don't have even the slightest opinion on the logic. So, for now, some very
general comments/questions:

Please, please review. I've put a great deal of thinking into this.

Have you considered adding logic to ExecutionDepsFix or X86FixupBWInsts, as

opposed to a new pass? I haven't thought about this too much myself, but those
are post-RA passes with somewhat similar goals, and it may make sense to share
some of the code.

How do you propose doing this in either of those two passes? And in any case,
the set of heurestics relies on virtual register liveness data that is rewritten
away long before pre-sched/pre-emit passes execute.

How generic is this? E.g. does this handle PR28442?

It matches on any GR8-defining instruction, so I'd say...as generic as can be?
And yes, of course it handles PR28442:

bash
19:54:19 ~/3rd/llvm> cat > pr28442.c
int foo(int a, int b, int c) {
  return (a > 0 && b > 0 && c > 0);
}
20:15:40 ~/3rd/llvm> clang -o - -S -emit-llvm -O3 pr28442.c | llc -filetype=obj -O3 -o setccfixup.o -setcc-fixup
20:16:07 ~/3rd/llvm> clang -o - -S -emit-llvm -O3 pr28442.c | llc -filetype=obj -O3 -o zextfixup.o
20:16:34 ~/3rd/llvm> diff -u <(python iacapipe.py setccfixup.o) <(python iacapipe.py zextfixup.o)
--- /dev/fd/63  2016-08-14 20:16:48.477885771 +0000
+++ /dev/fd/62  2016-08-14 20:16:48.477885771 +0000
@@ -1,13 +1,13 @@
-# Throughput: 2.65; Uops: 10; Latency: 5; Size: 23
+# Throughput: 2.65; Uops: 9; Latency: 5; Size: 22
 foo:
     test   %edi,%edi
     setg   %al
     test   %esi,%esi
     setg   %cl
     and    %al,%cl
+    xor    %eax,%eax
     test   %edx,%edx
     setg   %al
     and    %cl,%al
-    movzbl %al,%eax
     retq

Do you know why this is failing on cases that we catch with rL274692? If the

improvement is large enough we don't have to be strictly better than existing
code (rL274692 itself wasn't :) ), but it'd be good to understand what's going
on so we can at least guess whether it can happen in "interesting" non-toy
cases.

From my tests on CodeGen/X86/*.ll, it only ever fails on x86-32 for two
reasons:

The x32 allocation order for CSRs prioritizes ESI and EDI over EBX. Because

this pass never touches unused CSRs, it is possible for a function to alloc
E{AX,CX,DX,SI,DI} but not EBX and thus reducing the pool of available
GR32_with_sub8bit by one. This is illustrated by 2008-09-11-CoalescerBug.ll:

bash
20:22:18 ~/3rd/llvm> diff -u --unified=9999999 <(python iacapipe.py obj/kuper-x86/2008-09-11-CoalescerBug.o) <(python iacapipe.py obj/control-x86/2008-09-11-CoalescerBug.o)
--- /dev/fd/63  2016-08-14 20:23:00.352808087 +0000
+++ /dev/fd/62  2016-08-14 20:23:00.352808087 +0000
@@ -1,35 +1,34 @@
-# Throughput: 9.60; Uops: 43; Latency: 20; Size: 98
+# Throughput: 10.10; Uops: 45; Latency: 19; Size: 98
 func_3:
-    push   %ebx
+    push   %edi
     push   %esi
     push   %eax
     movzwl 0x0,%esi
     and    $0x1,%esi
     movl   $0x1,(%esp)
     call   15 <func_3+0x15>
     xor    %ecx,%ecx
     cmp    $0x2,%eax
     setl   %cl
-    xor    %ebx,%ebx
     cmp    %ecx,%esi
-    setge  %bl
-    xor    %eax,%eax
+    setge  %al
+    movzbl %al,%esi
     cmpw   $0x0,0x0
     sete   %al
-    mov    %eax,%esi
+    movzbl %al,%edi
     movl   $0x1,(%esp)
     call   3f <func_3+0x3f>
     xor    %ecx,%ecx
-    cmp    %eax,%esi
+    cmp    %eax,%edi
     setge  %cl
-    sub    %ecx,%ebx
-    cmp    $0x2,%ebx
+    sub    %ecx,%esi
+    cmp    $0x2,%esi
     sbb    %eax,%eax
     and    $0x1,%eax
     mov    %eax,(%esp)
     call   58 <func_3+0x58>
     add    $0x4,%esp
     pop    %esi
-    pop    %ebx
+    pop    %edi
     ret

It's also possible for RA to spill the GR8-defining instruction, thus

preventing this pass from recognizing the pattern. This is seen in
2009-08-23-SubRegReuseUndo.ll:

bash
--- annot/kuper-x86/2009-08-23-SubRegReuseUndo.o	2016-08-14 20:27:55.032325645 +0000
+++ annot/control-x86/2009-08-23-SubRegReuseUndo.o	2016-08-14 20:27:07.079072090 +0000
@@ -1,88 +1,88 @@
-# Throughput: 31.00; Uops: 113; Latency: 32; Size: 285
+# Throughput: 32.00; Uops: 116; Latency: 32; Size: 290
 uint80:
     push   %ebp
     push   %ebx
     push   %edi
     push   %esi
     sub    $0x1c,%esp
     movsbl 0x30(%esp),%ecx
-    xor    %eax,%eax
     test   %cx,%cx
-    setne  %al
-    mov    %eax,%esi
+    setne  0x1b(%esp)
     movzwl %cx,%eax
-    mov    %ecx,%edi
+    mov    %ecx,%esi
     mov    %eax,(%esp)
     mov    $0x0,%eax
     movsbl %al,%eax
     mov    %eax,0x4(%esp)
     call   29 <uint80+0x29>
     mov    %eax,%ebx
     or     $0x1,%bl
     movl   $0x1,(%esp)
     call   3c <uint80+0x3c>
     movl   $0x0,0x4(%esp)
     movl   $0x0,(%esp)
     call   50 <uint80+0x50>
-    mov    %edi,%ecx
+    mov    %esi,%ecx
+    mov    %ecx,%edi
     xor    %cl,%al
     xor    %bl,%al
     mov    %eax,%ebp
-    mov    %esi,0x4(%esp)
+    movzbl 0x1b(%esp),%eax
+    mov    %eax,0x4(%esp)
     mov    $0x0,%eax
     movzwl %ax,%eax
     mov    %eax,(%esp)
     call   6c <uint80+0x6c>
     mov    %eax,%esi
     movl   $0x1,0x4(%esp)
     movl   $0x0,(%esp)
     call   82 <uint80+0x82>
     xor    %eax,%eax
     test   %al,%al
     jne .L0
     mov    $0x1,%eax
     xor    %ecx,%ecx
     test   %cl,%cl
     jne .L1
 .L0:
     xor    %eax,%eax
 .L1:
     xor    %ebx,%ebx
     cmp    %eax,%esi
     setne  %bl
     movl   $0xfffffffe,(%esp)
     call   ad <uint80+0xad>
     mov    %ebp,%eax
     movsbl %al,%eax
     mov    %eax,0xc(%esp)
     mov    %ebx,0x8(%esp)
     movl   $0x1,0x4(%esp)
     movl   $0x0,(%esp)
     call   ce <uint80+0xce>
     xor    %eax,%eax
     test   %al,%al
     jne .L2
     mov    0x0,%eax
 .L2:
     mov    0x0,%eax
     movl   $0x1,0x4(%esp)
     movl   $0x0,(%esp)
     call   f2 <uint80+0xf2>
     xor    %eax,%eax
     test   %al,%al
     je .L3
     add    $0x1c,%esp
     pop    %esi
     pop    %edi
     pop    %ebx
     pop    %ebp
     ret
 .L3:
     mov    %edi,%eax
     movsbl %al,%eax
     mov    0x0,%ecx
     mov    %eax,(%esp)
     movl   $0x1,0x4(%esp)
     call   11b <uint80+0x11b>
     sub    $0x8,%esp

This is a fairly large patch. It's probably fairly self-contained, and I

don't immediately see a way to break into meaningful functional pieces.
However, that makes it a priori harder to review. So it'd be really nice to
make reviewers' life easier by providing copious documentation. :)

Right, I'll sprinkle in some comments.

There's a decent amount of generic code that doesn't look like it should

live in the pass. If it's generally useful, then it should live in one of the
utility headers. If not, and it's only ever instantiated for a single type,
then we can get rid of the template extra complexity. (Also, beware of the
lack of compiler support. We need to build LLVM with some fairly old or odd
compilers - MSVC 2013, first of foremost. If you haven't seen a pattern used
in LLVM, it may be because we still want to be built with compilers that don't
support it)

A couple of pervasive differences from the LLVM coding conventions I've

noticed from skimming:

We generally use camelcase for identifiers, with variables starting with an

uppercase letter, and functions with lowercase. Common acronyms (like TRI) are
all-caps. There are some exceptions ("standard" constructs like "int i", lower
cases with underscores where we want to fit STL style, changes to old code
that uses a different convention), but those are fairly rare.

We don't usually explicitly compare to nullptr.

Fixed.

bryant added inline comments.Aug 14 2016, 1:42 PM

lib/CodeGen/CalcSpillWeights.cpp
46 ↗	(On Diff #67985)	All this does is permit access to `copyHint` to the general public. Please advise if this belongs in a separate differential.

Fix nullptr deref caused by swapped cases.

upload the right patch.

n.bozhenov added a subscriber: n.bozhenov.Aug 17 2016, 5:26 AM

ping?

Hi,

What is the performance impact (both runtime and compile time) of the pass?
You list results for test/CodeGen/X86, which is not usually what we use for perform testing. Could you try on the LLVM test suite?

Moreover, like Michael said, we should try to merge this pass with something else. A possible approach would be a peephole like optimization where we can plug more patterns.

Cheers,
-Quentin

lib/CodeGen/CalcSpillWeights.cpp
46–48 ↗	(On Diff #68285)	This seems wrong to export this interface. External users should rely on MachineRegisterInfo::getSimpleHint. What is the problem in using MachineRegisterInfo::getSimpleHint?

In D23253#522534, @qcolombet wrote:

Hi,

Thanks for actually reviewing this.

What is the performance impact (both runtime and compile time) of the pass?

Run-time impact, according to IACA, would be 1% to 40% (pre-ra-sched.ll) faster. Never worse. I'll measure the compile-time perf tonight. Do you have any suggestions on the preferred way to do this?

You list results for test/CodeGen/X86, which is not usually what we use for perform testing. Could you try on the LLVM test suite?

My measurements have thus far centered around the performance of generated code. Since this pass only ever runs when -march=x86/-64, there would be no difference in the rest of the arches, improvements or otherwise. If you really want, I could run the rest of the suite but the result would be a bunch of passes and xfails. What sort of results do you really want me to record here?

Moreover, like Michael said, we should try to merge this pass with something else. A possible approach would be a peephole like optimization where we can plug more patterns.

As I've mentioned before, this depends on virtual register LiveInterval information that is lost after the rewrite pass.

Cheers,
-Quentin

lib/CodeGen/CalcSpillWeights.cpp
46–48 ↗	(On Diff #68285)	In cases where there is more than one hint, getSimpleHint only returns one and misses the rest.

Hi bryant,

I haven't found time to review this patch in detail yet, but here are some initial comments/questions.

Given some of your sample generated-code changes, I expected to see lots of changes in tests/CodeGen/X86. Are you planning to add those changes to this patch later?

Your output code examples have a few instances of this:

+ xorl %r10d, %r10d
+ movb 2(%rdi), %r10b

Rather than insert an xor here, you'd prefer to convert the movb to movzbl.

Converting movb to movzbl (and movw to movzwl) is essentially what FixupBWInstPass does. The author of that pass was deliberately aggressive about converting movw to movzwl but a bit more conservative about converting to movb to movzbl. Here are the relevant comments:

// Only replace 8 bit loads with the zero extending versions if
// in an inner most loop and not optimizing for size. This takes
// an extra byte to encode, and provides limited performance upside.

// Always try to replace 16 bit load with 32 bit zero extending.
// Code size is the same, and there is sometimes a perf advantage
// from eliminating a false dependence on the upper portion of
// the register.

This leads to 2 questions for me.

(1) What is the code size impact of this new pass?
(2) How does the behavior of this new pass compare to simply changing X86FixupBWInsts to always optimize the 8-bit case, i.e. not check for innermost loops?

Thanks,
Dave

In D23253#522637, @DavidKreitzer wrote:

Hi bryant,

I haven't found time to review this patch in detail yet, but here are some initial comments/questions.

Given some of your sample generated-code changes, I expected to see lots of changes in tests/CodeGen/X86. Are you planning to add those changes to this patch later?

Yes, definitely. Will include the updates to test/*.ll in my next diff
release.

Your output code examples have a few instances of this:

+ xorl %r10d, %r10d
+ movb 2(%rdi), %r10b

Rather than insert an xor here, you'd prefer to convert the movb to movzbl.

Just to clarify, the optimal result is movzbl 2(%rdi), %r10d, not

movb 2(%rdi), %al; movzbl %al, %r10d (which is the status quo)

nor

xorl %r10d, %r10d; movb 2(%rdi), %r10d (which is what this patch does)

Correct?

Converting movb to movzbl (and movw to movzwl) is essentially what FixupBWInstPass does. The author of that pass was deliberately aggressive about converting movw to movzwl but a bit more conservative about converting to movb to movzbl. Here are the relevant comments:
// Only replace 8 bit loads with the zero extending versions if
// in an inner most loop and not optimizing for size. This takes
// an extra byte to encode, and provides limited performance upside.

I find this comment strange. The stats on x86-64 are:

21:38:41 ~/3rd/llvm> python iacapipe.py --arch 64 a.out
# Throughput: 0.50; Uops: 2; Latency: 6; Size: 7
kuper:
    mov    0x2(%rdi),%al
    movzbl %al,%r10d

# Throughput: 0.50; Uops: 1; Latency: 5; Size: 7
this_patch:
    xor    %r10d,%r10d
    mov    0x2(%rdi),%r10b

# Throughput: 0.50; Uops: 1; Latency: 5; Size: 5
optimal:
    movzbl 0x2(%rdi),%r10d

And on x86-32:

21:39:32 ~/3rd/llvm> python iacapipe.py --arch 32 a.out
# Throughput: 0.50; Uops: 2; Latency: 6; Size: 6
kuper:
    mov    0x2(%edi),%al
    movzbl %al,%eax

# Throughput: 0.50; Uops: 1; Latency: 5; Size: 5
this_patch:
    xor    %eax,%eax
    mov    0x2(%edi),%al

# Throughput: 0.50; Uops: 1; Latency: 5; Size: 4
optimal:
    movzbl 0x2(%edi),%eax

So the converted movzb to movzbl is smaller and has fewer micro-ops. Am I
missing something obvious?

// Always try to replace 16 bit load with 32 bit zero extending.
// Code size is the same, and there is sometimes a perf advantage
// from eliminating a false dependence on the upper portion of
// the register.

This leads to 2 questions for me.

(1) What is the code size impact of this new pass?

I'll compile a nice histogram from the diff data in
https://reviews.llvm.org/P7213 and https://reviews.llvm.org/P7214 .

(2) How does the behavior of this new pass compare to simply changing X86FixupBWInsts to always optimize the 8-bit case, i.e. not check for innermost loops?

If the above results from IACA are to be trusted, it would be better to always
convert "movb; movzbl" to plain old movzbl. Currently, this pass would pre-empt
X86FixupBWInsts by transforming it into "xor; movb" before the later pass has a
chance to see the movb-movzbl pattern. It would be easy to add special cases
patterns that this pass should leave untouched.

Thanks,
Dave

Please let me know if additional clarification is needed.

bryant added inline comments.Aug 22 2016, 7:16 PM

lib/CodeGen/CalcSpillWeights.cpp

46–48 ↗

(On Diff #68285)

Here is an example:

c
#include <stdint.h>

typedef struct {
  uint32_t low;
  uint32_t high;
} u32pair;

extern void use8(uint8_t);

uint32_t f(uint64_t *m, uint32_t a, uint32_t b) {
  u32pair src = {a > 0, b > 0};
  use8(a > 0);
  u32pair rv;
  asm volatile("cmpxchg8b %2"
               : "+a"(rv.low), "+d"(rv.high), "+m"(m)
               : "b"(src.low), "c"(src.high)
               : "flags");
  return rv.low > 0 && rv.high > 0;
}

pre-regalloc machineinstrs:

0B      BB#0: derived from LLVM BB %3
            Live Ins: %RDI %ESI %EDX
16B             %vreg2<def> = COPY %EDX; GR32:%vreg2
32B             %vreg1<def> = COPY %ESI; GR32:%vreg1
48B             %vreg0<def> = COPY %RDI; GR64:%vreg0
64B             MOV64mr <fi#0>, 1, %noreg, 0, %noreg, %vreg0; mem:ST8[%4](tbaa=!2) GR64:%vreg0
80B             TEST32rr %vreg1, %vreg1, %EFLAGS<imp-def>; GR32:%vreg1
96B             %vreg5<def> = SETNEr %EFLAGS<imp-use,kill>; GR8:%vreg5
112B            %vreg6<def> = MOVZX32rr8 %vreg5; GR32:%vreg6 GR8:%vreg5
128B            TEST32rr %vreg2, %vreg2, %EFLAGS<imp-def>; GR32:%vreg2
144B            %vreg7<def> = SETNEr %EFLAGS<imp-use,kill>; GR8:%vreg7
160B            %vreg8<def> = MOVZX32rr8 %vreg7; GR32:%vreg8 GR8:%vreg7
176B            ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
192B   ==>      %EDI<def> = COPY %vreg6; GR32:%vreg6
208B            CALL64pcrel32 <ga:@use8>, <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %RSP<imp-use>, %EDI<imp-use>, %RSP<imp-def>
224B            ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
240B   ==>      %EBX<def> = COPY %vreg6; GR32:%vreg6
256B            %ECX<def> = COPY %vreg8; GR32:%vreg8
272B            INLINEASM <es:cmpxchg8b $2> [sideeffect] [mayload] [maystore] [attdialect], $0:[regdef], %EAX<imp-def,tied>, $1:[regdef], %EDX<imp-def,tied>, $2:[mem:m], <fi#0>, 1, %noreg, 0, %noreg, $3:[reguse], %EBX<kill>, $4:[reguse], %ECX<kill>, $5:[reguse tiedto:$0], %EAX<undef,tied3>, $6:[reguse tiedto:$1], %EDX<undef,tied5>, $7:[mem:m], <fi#0>, 1, %noreg, 0, %noreg, $8:[clobber], %EFLAGS<earlyclobber,imp-def,dead>, $9:[clobber], %EFLAGS<earlyclobber,imp-def,dead>, <!5>
276B            %vreg12<def> = COPY %EDX; GR32:%vreg12
280B            %vreg11<def> = COPY %EAX; GR32:%vreg11
320B            TEST32rr %vreg11, %vreg11, %EFLAGS<imp-def>; GR32:%vreg11
336B            %vreg13<def> = SETNEr %EFLAGS<imp-use,kill>; GR8:%vreg13
352B            TEST32rr %vreg12, %vreg12, %EFLAGS<imp-def>; GR32:%vreg12
368B            %vreg15<def> = SETNEr %EFLAGS<imp-use,kill>; GR8:%vreg15
400B            %vreg15<def,tied1> = AND8rr %vreg15<tied0>, %vreg13, %EFLAGS<imp-def,dead>; GR8:%vreg15,%vreg13
416B            %vreg16<def> = MOVZX32rr8 %vreg15; GR32:%vreg16 GR8:%vreg15
432B            %EAX<def> = COPY %vreg16; GR32:%vreg16
448B            RET 0, %EAX

%vreg6 has two hints: EDI and EBX. However, getSimpleHint shows that
MachineRegisterInfo only records EDI:

selectOrSplit GR32:%vreg6 [112r,240r:0)  0@112r w=5.738636e-03
hints: %EDI
missed hint %EDI

Just to clarify, the optimal result is movzbl 2(%rdi), %r10d, not

movb 2(%rdi), %al; movzbl %al, %r10d (which is the status quo)

nor

xorl %r10d, %r10d; movb 2(%rdi), %r10d (which is what this patch does)

Correct?

Yes, exactly.

So the converted movzb to movzbl is smaller and has fewer micro-ops. Am I
missing something obvious?

AHA! Yes, you are right that this conversion is always profitable:

mov    0x2(%edi),%al
movzbl %al,%eax

movzbl 0x2(%edi),%eax

but the X86FixupBWInsts pass will also do this:

mov    0x2(%edi),%al

movzbl 0x2(%edi),%eax

In other words, it will convert the movb to movzbl even if the result of the movb is never zero extended. Doing so is often a win for performance due to the partial register dependence of the movb, but movzbl does encode larger.

I assume you are saying that your new pass will only transform 8-bit moves that are subsequently zero extended? In that case, you can disregard my code size concern for now.

Ultimately, we will want to conditionally optimize the 8-bit instructions that do not feed into a zero extend. (That applies both to the movb --> movzbl transformation and to inserting xor before instructions like SETcc.) It is just a bit harder, due to the tradeoff that needs to be made. I made the same comment on Michael's FixupSetCC pass - just haven't gotten around to experimenting with it yet.

I'll compile a nice histogram from the diff data in
https://reviews.llvm.org/P7213 and https://reviews.llvm.org/P7214 .

That would be very nice! Like I said, though, I am less worried about the code size impact if you are only optimizing instructions that are already being zero extended.

This does mean, however, that this pass will not be an adequate replacement for FixupBWInsts until we add the raw movb --> movzbl transform.

Thanks,
Dave

Revision Contents

Path

Size

lib/

Target/

X86/

1 line

2 lines

685 lines

21 lines

test/

CodeGen/

X86/

avx-intrinsics-x86.ll

192 lines

10 lines

6 lines

4 lines

36 lines

3 lines

169 lines

sse-intrinsics-fast-isel.ll

64 lines

sse-intrinsics-x86.ll

65 lines

Diff 67362

lib/Target/X86/CMakeLists.txt

	Show All 14 Lines
	set(sources			set(sources
	X86AsmPrinter.cpp			X86AsmPrinter.cpp
	X86CallFrameOptimization.cpp			X86CallFrameOptimization.cpp
	X86ExpandPseudo.cpp			X86ExpandPseudo.cpp
	X86FastISel.cpp			X86FastISel.cpp
	X86FixupBWInsts.cpp			X86FixupBWInsts.cpp
	X86FixupLEAs.cpp			X86FixupLEAs.cpp
	X86FixupSetCC.cpp			X86FixupSetCC.cpp
				X86FixupZExt.cpp
	X86FloatingPoint.cpp			X86FloatingPoint.cpp
	X86FrameLowering.cpp			X86FrameLowering.cpp
	X86ISelDAGToDAG.cpp			X86ISelDAGToDAG.cpp
	X86ISelLowering.cpp			X86ISelLowering.cpp
	X86InstrInfo.cpp			X86InstrInfo.cpp
	X86MCInstLower.cpp			X86MCInstLower.cpp
	X86MachineFunctionInfo.cpp			X86MachineFunctionInfo.cpp
	X86OptimizeLEAs.cpp			X86OptimizeLEAs.cpp
	Show All 21 Lines

lib/Target/X86/X86.h

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines

	/// Return a pass that removes redundant LEA instructions and redundant address			/// Return a pass that removes redundant LEA instructions and redundant address
	/// recalculations.			/// recalculations.
	FunctionPass *createX86OptimizeLEAs();			FunctionPass *createX86OptimizeLEAs();

	/// Return a pass that transforms setcc + movzx pairs into xor + setcc.			/// Return a pass that transforms setcc + movzx pairs into xor + setcc.
	FunctionPass *createX86FixupSetCC();			FunctionPass *createX86FixupSetCC();

				FunctionPass *createX86FixupZExt();

	/// Return a pass that expands WinAlloca pseudo-instructions.			/// Return a pass that expands WinAlloca pseudo-instructions.
	FunctionPass *createX86WinAllocaExpander();			FunctionPass *createX86WinAllocaExpander();

	/// Return a pass that optimizes the code-size of x86 call sequences. This is			/// Return a pass that optimizes the code-size of x86 call sequences. This is
	/// done by replacing esp-relative movs with pushes.			/// done by replacing esp-relative movs with pushes.
	FunctionPass *createX86CallFrameOptimization();			FunctionPass *createX86CallFrameOptimization();

	/// Return an IR pass that inserts EH registration stack objects and explicit			/// Return an IR pass that inserts EH registration stack objects and explicit
	Show All 20 Lines

lib/Target/X86/X86FixupZExt.cpp

This file was added.

				#include "X86Subtarget.h"
				#include "llvm/CodeGen/LiveIntervalAnalysis.h"
				#include "llvm/CodeGen/LivePhysRegs.h"
				#include "llvm/CodeGen/LiveRegMatrix.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/CodeGen/RegisterClassInfo.h"
				#include "llvm/CodeGen/VirtRegMap.h"

				#include <iterator>

				#define DEBUG_TYPE "x86-fixup-zext"

				namespace {
				using namespace llvm;
				using std::unique_ptr;
				using std::vector;
				using std::pair;
				using Segment = LiveRange::Segment;

				template <typename Elem, typename Container>
				using is_iterable_of = typename std::enable_if<std::is_same<
				typename std::decay<decltype(*std::declval<Container>().begin())>::type,
				Elem>::value>::type;

				template <typename T> auto push_to(T &t) -> decltype(std::back_inserter(t)) {
				return std::back_inserter(t);
				}

				unsigned get_phys(unsigned reg, const VirtRegMap &vrm) {
				return TargetRegisterInfo::isVirtualRegister(reg) ? vrm.getPhys(reg) : reg;
				}

				unsigned get_phys(const MachineOperand &regop, const VirtRegMap &vrm) {
				const auto *f = regop.getParent()->getParent()->getParent();
				const auto &tri = *f->getSubtarget().getRegisterInfo();
				assert(regop.isReg());
				unsigned preg = get_phys(regop.getReg(), vrm);
				return regop.getSubReg() ? tri.getSubReg(preg, regop.getSubReg()) : preg;
				}

				unsigned get_phys(const MachineInstr &i, unsigned opnum,
				const VirtRegMap &vrm) {
				return get_phys(i.getOperand(opnum), vrm);
				}

				DenseMap<MachineBasicBlock , MachineInstr >
				dominating_defs(unsigned gr8, const MachineRegisterInfo &mri,
				const SlotIndexes &si) {
				DenseMap<MachineBasicBlock , MachineInstr > defs;
				// at least until release_37, getInstructionIndex is expensive.
				DenseMap<MachineBasicBlock *, SlotIndex> cached;

				for (MachineInstr &def : mri.def_instructions(gr8)) {
				unsigned tied_use;
				if (def.isRegTiedToUseOperand(0, &tied_use) &&
				def.getOperand(tied_use).getReg() != def.getOperand(0).getReg()) {
				DEBUG(dbgs() << "dominating_defs: " << def.getOperand(0) << " is tied to "
				<< def.getOperand(tied_use) << "\n");
				return dominating_defs(def.getOperand(tied_use).getReg(), mri, si);
				}
				MachineBasicBlock *bb = def.getParent();
				if (defs.find(bb) == defs.end() \|\|
				si.getInstructionIndex(def) < cached.lookup(bb)) {
				cached[bb] = si.getInstructionIndex(def);
				defs[bb] = &def;
				}
				}
				return defs;
				}

				void add_seg(SlotIndex s, SlotIndex e, LiveInterval &live, LiveIntervals &li) {
				VNInfo *valno = !live.hasAtLeastOneValue()
				? live.getNextValue(s, li.getVNInfoAllocator())
				: *live.vni_begin();
				assert(live.getNumValNums() == 1);
				live.addSegment(Segment(std::move(s), std::move(e), valno));
				}

				void add_seg(MachineInstr &s, MachineInstr &e, LiveInterval &live,
				LiveIntervals &li) {
				return add_seg(li.getInstructionIndex(s), li.getInstructionIndex(e), live,
				li);
				}

				void add_segs(LiveInterval &src, LiveInterval &dest, LiveIntervals &li) {
				for (const Segment &s : src) {
				add_seg(s.start, s.end, dest, li);
				}
				}

				MachineInstr *insert_mov32r0(MachineInstr &def8, LiveInterval &live,
				LiveIntervals &li) {
				auto slot = [&](MachineInstr &i) { return li.getInstructionIndex(i); };
				const MachineFunction &f = *def8.getParent()->getParent();
				const auto &tri = f.getSubtarget().getRegisterInfo();
				MachineBasicBlock &bb = *def8.getParent();
				MachineBasicBlock::iterator ins = &def8;

				if (const Segment *eflagseg =
				li.getRegUnit(*MCRegUnitIterator(X86::EFLAGS, tri))
				.getSegmentContaining(slot(def8))) {
				if (eflagseg->start <= slot(*bb.begin()) && bb.isLiveIn(X86::EFLAGS)) {
				if (bb.pred_size() > 1) {
				return nullptr;
				}
				add_seg(li.getMBBStartIdx(&bb), slot(def8), live, li);
				return insert_mov32r0((bb.pred_begin())->rbegin(), live, li);
				}
				ins = li.getInstructionFromIndex(eflagseg->start);
				}
				// insert dummy mov32r0
				MachineInstrBuilder mib =
				BuildMI(bb, ins, def8.getDebugLoc(),
				f.getSubtarget().getInstrInfo()->get(X86::MOV32r0), 0);
				return mib;
				}

				template <typename T, typename = is_iterable_of<LiveInterval *, T>>
				raw_ostream &operator<<(raw_ostream &out, const T &es) {
				for (LiveInterval *e : es) {
				out << "\t" << (*e) << "\n";
				}
				return out;
				}

				template <typename T, typename = is_iterable_of<LiveInterval *, T>>
				bool interferes(const T &as, const LiveInterval &b,
				const MachineRegisterInfo &mri) {
				return any_of(as, [&](const LiveInterval *a) { return a->overlaps(b); });
				}

				template <typename Iterator, typename Predicate>
				Iterator move_to_end_if(Iterator first, Iterator last, Predicate p) {
				Iterator rv = last;
				while (first != rv) {
				if (p(*first)) {
				--rv;
				std::swap(first, rv);
				} else {
				++first;
				}
				}
				return rv;
				}

				template <typename Range, typename Predicate>
				auto move_to_end_if(Range &r, Predicate p) -> decltype(r.end()) {
				return move_to_end_if(r.begin(), r.end(), std::move(p));
				}

				struct ReAllocTool {
				const TargetRegisterInfo *tri;
				const MachineRegisterInfo *mri;
				LiveRegMatrix *lrm;
				VirtRegMap *vrm;
				RegisterClassInfo rci;
				BitVector unused_csr;

				void add_reg_to_bv(BitVector &bv, MCPhysReg reg) const {
				for (MCRegAliasIterator r(reg, tri, true); r.isValid(); ++r) {
				bv.set(*r);
				}
				}

				BitVector bv_from_regs(ArrayRef<MCPhysReg> regs) const {
				BitVector rv(tri->getNumRegs());
				for (const MCPhysReg &r : regs) {
				add_reg_to_bv(rv, r);
				}
				return rv;
				}

				template <typename Predicate>
				BitVector bv_from_regs(ArrayRef<MCPhysReg> regs, Predicate p) const {
				BitVector rv(tri->getNumRegs());
				for (const MCPhysReg &r : regs) {
				if (p(r)) {
				add_reg_to_bv(rv, r);
				}
				}
				}

				ReAllocTool(const MachineFunction &f, LiveRegMatrix &lrm_, VirtRegMap &vrm_)
				: tri(f.getSubtarget().getRegisterInfo()), mri(&f.getRegInfo()),
				lrm(&lrm_), vrm(&vrm_), rci(), unused_csr(tri->getNumRegs()) {
				const MCPhysReg *csr = tri->getCalleeSavedRegs(&f);
				for (unsigned i = 0; csr[i] != 0; i += 1) {
				if (!lrm->isPhysRegUsed(csr[i])) {
				add_reg_to_bv(unused_csr, csr[i]);
				}
				}
				rci.runOnMachineFunction(f);
				}

				bool interf(LiveInterval &live, unsigned preg) const {
				return lrm->checkInterference(live, preg) != LiveRegMatrix::IK_Free;
				}

				template <typename T, typename = is_iterable_of<LiveInterval *, T>>
				bool interf(LiveInterval &live, unsigned preg, T &evictees) const {
				if (lrm->checkRegMaskInterference(live, preg) \|\|
				lrm->checkRegUnitInterference(live, preg)) {
				return true;
				}
				DenseSet<LiveInterval *> ev;
				for (MCRegUnitIterator regunit(preg, tri); regunit.isValid(); ++regunit) {
				LiveIntervalUnion::Query &q = lrm->query(live, *regunit);
				if (q.collectInterferingVRegs() > 0) {
				for (LiveInterval *l : q.interferingVRegs()) {
				ev.insert(l);
				}
				}
				}
				std::copy(ev.begin(), ev.end(), push_to(evictees));
				return evictees.size() > 0;
				}

				const MCPhysReg *alloc_next(LiveInterval &live,
				const BitVector *except = nullptr,
				ArrayRef<MCPhysReg>::iterator *it = nullptr,
				const TargetRegisterClass *rc = nullptr) const {
				ArrayRef<MCPhysReg> ord =
				rci.getOrder(rc ? rc : mri->getRegClass(live.reg));
				BitVector rs = unused_csr;
				if (except != nullptr) {
				rs \|= *except;
				}
				auto rv = std::find_if(
				it ? std::next(*it) : ord.begin(), ord.end(),
				[&](MCPhysReg r) { return !rs.test(r) && !interf(live, r); });
				return rv == ord.end() ? nullptr : rv;
				}

				MCPhysReg alloc(LiveInterval &live, const BitVector *except = nullptr,
				const TargetRegisterClass *rc = nullptr) const {
				const MCPhysReg *rv = alloc_next(live, except, nullptr, rc);
				return rv == nullptr ? 0 : *rv;
				}

				// (re-)allocate a group of interfering intervals. brute force search. returns
				// nullptr if impossible.
				template <typename C, typename = is_iterable_of<LiveInterval *, C>>
				unique_ptr<vector<pair<LiveInterval , const MCPhysReg >>>
				alloc_interf_intervals(C group, const BitVector *except = nullptr) const {
				if (group.empty()) {
				return make_unique<vector<pair<LiveInterval , const MCPhysReg >>>();
				}
				auto assigned =
				make_unique<vector<pair<LiveInterval , const MCPhysReg >>>();

				auto maybe_unassign = [&](pair<LiveInterval , const MCPhysReg > &p) {
				if (p.second) {
				lrm->unassign(*p.first);
				}
				};

				auto maybe_assign = [&](pair<LiveInterval , const MCPhysReg > &p) {
				if (p.second) {
				lrm->assign(p.first, p.second);
				}
				};

				auto try_next_in_group = [&]() {
				assert(!group.empty());
				assigned->push_back(
				std::make_pair(group.back(), alloc_next(*group.back(), except)));
				group.pop_back();
				maybe_assign(assigned->back());
				};

				auto back_to_previous = [&]() {
				assert(!assigned->empty());
				maybe_unassign(assigned->back());
				group.push_back(assigned->back().first);
				assigned->pop_back();
				};

				auto try_next_reg = [&]() {
				assert(!assigned->empty());
				maybe_unassign(assigned->back());
				assigned->back().second =
				alloc_next(*assigned->back().first, except, &assigned->back().second);
				maybe_assign(assigned->back());
				};

				try_next_in_group();

				while (!group.empty() \|\| assigned->back().second == nullptr) {
				if (assigned->back().second == nullptr) {
				back_to_previous();
				if (assigned->empty()) {
				return nullptr;
				}
				try_next_reg();
				} else {
				try_next_in_group();
				}
				}
				for (auto &p : *assigned) {
				lrm->unassign(*p.first);
				}
				return assigned;
				}

				template <typename C, typename = is_iterable_of<LiveInterval *, C>>
				unique_ptr<vector<MCPhysReg>>
				evict_intervals(const C &lives, const BitVector *excepts = nullptr) const {
				DenseMap<LiveInterval , const MCPhysReg > newmap;
				vector<LiveInterval *> ungrouped(lives.begin(), lives.end());

				while (!ungrouped.empty()) {
				vector<LiveInterval *> group;
				group.push_back(ungrouped.back());
				ungrouped.pop_back();
				bool done = false;
				while (!done) {
				auto it = move_to_end_if(ungrouped, [&](LiveInterval *h) {
				return interferes(group, h, mri);
				});
				done = it == ungrouped.end();
				std::copy(it, ungrouped.end(), push_to(group));
				ungrouped.erase(it, ungrouped.end());
				}
				if (auto newassigns = alloc_interf_intervals(group, excepts)) {
				for (auto pair_ : *newassigns) {
				newmap.insert(pair_);
				}
				} else {
				return nullptr;
				}
				}
				auto rv = make_unique<vector<MCPhysReg>>();
				transform(lives, push_to(rv), [&](LiveInterval l) { return *newmap[l]; });
				return rv;
				}

				MCPhysReg unassign(LiveInterval &live) {
				unsigned old = get_phys(live.reg, *vrm);
				lrm->unassign(live);
				return old;
				}

				template <typename C, typename = is_iterable_of<LiveInterval *, C>>
				vector<MCPhysReg> unassign_all(C &lives) {
				vector<MCPhysReg> r;
				transform(lives, push_to(r), [&](LiveInterval l) { return unassign(l); });
				return r;
				}

				template <typename C, typename D,
				typename = is_iterable_of<LiveInterval *, C>,
				typename = is_iterable_of<MCPhysReg, D>>
				void assign_all(C &lives, D &&regs) {
				for (auto intv_reg : zip_first(lives, std::forward<D>(regs))) {
				lrm->assign(*std::get<0>(intv_reg), std::get<1>(intv_reg));
				}
				}

				bool reserve_phys_reg(MCPhysReg preg, LiveInterval &live) {
				vector<LiveInterval *> evictees;
				if (!interf(live, preg, evictees)) {
				DEBUG(dbgs() << "ReAllocTool: " << tri->getName(preg)
				<< " is already free.\n");
				return true;
				} else if (evictees.size() > 0) {
				DEBUG(dbgs() << "ReAllocTool: trying to reserve " << tri->getName(preg)
				<< " by evicting:\n"
				<< evictees);
				vector<MCPhysReg> oldregs = unassign_all(evictees);
				BitVector bv = bv_from_regs(preg);
				if (auto newregs = evict_intervals(evictees, &bv)) {
				assign_all(evictees, *newregs);
				return true;
				}
				assign_all(evictees, oldregs);
				}
				DEBUG(dbgs() << "ReAllocTool: unable to reserve " << tri->getName(preg)
				<< "\n");
				return false;
				}
				};

				struct Candidate {
				MachineInstr *ins;
				MachineInstr *gr8def;
				MachineInstr *movzx;
				vector<MCPhysReg> constraints;
				LiveInterval *live32;
				LiveInterval *live8;
				unique_ptr<LiveInterval> extra;
				// private:
				// assign/reassign
				unsigned pdest;
				unsigned psrc;

				static MachineInstr *valid_candidate(MachineInstr &i, LiveIntervals &li) {
				if (i.getOpcode() != X86::MOVZX32rr8 \|\| i.getOperand(1).getSubReg() != 0) {
				return nullptr;
				}

				const MachineFunction &f = *i.getParent()->getParent();
				const MachineRegisterInfo &mri = f.getRegInfo();
				const TargetRegisterInfo &tri = *f.getSubtarget().getRegisterInfo();

				unsigned src = i.getOperand(1).getReg();
				auto bbdefs = dominating_defs(src, mri, *li.getSlotIndexes());
				if (bbdefs.size() > 1 \|\| (mri.getSimpleHint(src) &&
				!tri.isVirtualRegister(mri.getSimpleHint(src)))) {
				DEBUG(dbgs() << "passing over " << i << "defs: " << bbdefs.size()
				<< ", gr8 hint: " << PrintReg(mri.getSimpleHint(src), &tri)
				<< "\n");
				return nullptr;
				}
				return bbdefs.begin()->second;
				}

				static unique_ptr<Candidate> from_mi(MachineInstr &i, LiveIntervals &li,
				const VirtRegMap &vrm) {
				const MachineFunction &f = *i.getParent()->getParent();
				const MachineRegisterInfo &mri = f.getRegInfo();
				const TargetRegisterInfo &tri = *f.getSubtarget().getRegisterInfo();

				MachineInstr def, ins;
				if ((def = valid_candidate(i, li)) == nullptr) {
				return nullptr;
				}

				unsigned dest = i.getOperand(0).getReg(), src = i.getOperand(1).getReg();
				LiveInterval &live32 = li.getInterval(dest), &live8 = li.getInterval(src);
				unique_ptr<LiveInterval> extra(new LiveInterval(live32.reg, live32.weight));

				if ((ins = insert_mov32r0(def, extra, li)) == nullptr) {
				return nullptr;
				}

				li.InsertMachineInstrInMaps(*ins);
				add_seg(ins, def, *extra, li);
				if (extra->overlaps(live32)) {
				li.RemoveMachineInstrFromMaps(*ins);
				ins->eraseFromParent();
				return nullptr;
				}

				add_segs(live32, *extra, li);
				add_segs(live8, *extra, li);

				// look for copy instr reg alloc hints
				vector<MCPhysReg> cx;
				for (const MachineInstr &use : mri.use_instructions(dest)) {
				if (use.isCopy() && !tri.isVirtualRegister(use.getOperand(0).getReg())) {
				unsigned r =
				use.getOperand(1).getSubReg()
				? tri.getMatchingSuperReg(use.getOperand(0).getReg(),
				use.getOperand(1).getSubReg(),
				mri.getRegClass(dest))
				: get_phys(use.getOperand(0), vrm);
				if (f.getSubtarget<X86Subtarget>().is64Bit() \|\|
				X86::GR32_ABCDRegClass.contains(r)) {
				cx.push_back(r);
				}
				}
				}

				return unique_ptr<Candidate>(new Candidate{
				ins, def, &i, std::move(cx), &live32, &live8, std::move(extra), 0, 0});
				}

				bool operator<(const Candidate &b) const {
				if (constraints.size() > 0 && b.constraints.size() == 0)
				return true;
				if (b.constraints.size() > 0 && constraints.size() == 0)
				return false;
				if (constraints.size() < b.constraints.size())
				return true;
				return li_size() > b.li_size();
				}

				unsigned li_size() const { return extra->getSize(); }

				friend raw_ostream &operator<<(raw_ostream &out, const Candidate &c) {
				out << "Candidate:\n\tinserted: " << (*c.ins)
				<< "\tgr8 def: " << (c.gr8def) << "\tmovzx: " << (c.movzx)
				<< "\txor gr32: " << (*c.extra);
				if (c.constraints.size() > 0) {
				out << "\n\tconstraints:";
				for (unsigned cx : c.constraints) {
				out << " " << PrintReg(cx, &c.tri());
				}
				} else {
				out << "\n\tno constraints.";
				}
				return out;
				}

				const X86RegisterInfo &tri() const {
				return reinterpret_cast<const X86RegisterInfo >(
				ins->getParent()->getParent()->getSubtarget().getRegisterInfo());
				}

				const X86InstrInfo &tii() const {
				return reinterpret_cast<const X86InstrInfo >(
				ins->getParent()->getParent()->getSubtarget().getInstrInfo());
				}

				MachineRegisterInfo &mri() const {
				return ins->getParent()->getParent()->getRegInfo();
				}

				void unassign(ReAllocTool &ratool) {
				pdest = ratool.unassign(*live32);
				psrc = ratool.unassign(*live8);
				}

				void assign_old(LiveRegMatrix &lrm) {
				lrm.assign(*live32, pdest);
				lrm.assign(*live8, psrc);
				pdest = psrc = 0;
				}

				void assign_new(LiveRegMatrix &lrm, LiveIntervals &li, MCPhysReg newdest) {
				// vsrc uses => vdest:sub_8bit; insert vdest = mov32r0; del movzx
				unsigned vdest = movzx->getOperand(0).getReg();
				unsigned vsrc = movzx->getOperand(1).getReg();

				// in-place operand mutation would confuse defusechain_iterator
				vector<MachineOperand *> ops;
				transform(mri().reg_operands(vsrc), push_to(ops),
				[](MachineOperand &op) { return &op; });
				for (MachineOperand *op : ops) {
				DEBUG(dbgs() << "changing " << (*op->getParent()));
				op->substVirtReg(vdest, X86::sub_8bit, tri());
				DEBUG(dbgs() << "to " << (*op->getParent()));
				}

				li.RemoveMachineInstrFromMaps(*movzx);
				movzx->eraseFromParent();
				li.removeInterval(vsrc);
				li.removeInterval(vdest);

				const TargetRegisterClass &destcls = *mri().getRegClass(vdest);
				ins->getOperand(0).setReg(vdest);
				if (destcls.getSize() > 32 / 8) {
				ins->getOperand(0).setSubReg(X86::sub_32bit);
				ins->getOperand(0).setIsUndef();
				}
				if (const TargetRegisterClass *newcls = gr8def->getRegClassConstraintEffect(
				0, ins->getRegClassConstraintEffect(0, &destcls, &tii(), &tri()),
				&tii(), &tri())) {
				DEBUG(dbgs() << "updating reg class from "
				<< tri().getRegClassName(&destcls) << " to "
				<< tri().getRegClassName(newcls) << "\n");
				mri().setRegClass(vdest, newcls);
				} else {
				DEBUG(dbgs() << "not updating reg class\n");
				}
				lrm.assign(li.createAndComputeVirtRegInterval(vdest), newdest);
				}

				bool valid_dest_reg(MCPhysReg physreg) const {
				return mri().getRegClass(movzx->getOperand(0).getReg())->contains(physreg);
				}
				};

				struct X86FixupZExt : public MachineFunctionPass {
				static char id;

				X86FixupZExt() : MachineFunctionPass(id) {}

				const char *getPassName() const override {
				return "X86 Zero-Extension Fix-up";
				}

				void getAnalysisUsage(AnalysisUsage &a) const override {
				a.addRequired<LiveRegMatrix>();
				a.addRequired<VirtRegMap>();
				a.addRequired<LiveIntervals>();
				a.setPreservesAll();
				return MachineFunctionPass::getAnalysisUsage(a);
				}

				bool runOnMachineFunction(MachineFunction &f) override {
				VirtRegMap &vrm = getAnalysis<VirtRegMap>();
				LiveIntervals &li = getAnalysis<LiveIntervals>();
				LiveRegMatrix &lrm = getAnalysis<LiveRegMatrix>();
				vector<Candidate> constrained, cands, dispose;
				ReAllocTool ratool(f, lrm, vrm);

				DEBUG(dbgs() << "analyzing " << f.getName() << "'s movzxes.\n");
				for (MachineBasicBlock &bb : f) {
				for (MachineInstr &i : bb) {
				if (auto cand = Candidate::from_mi(i, li, vrm)) {
				if (cand->constraints.size() > 0) {
				constrained.emplace_back(std::move(*cand.release()));
				} else {
				cands.emplace_back(std::move(*cand.release()));
				}
				}
				}
				}

				BitVector nosub8;
				if (f.getSubtarget<X86Subtarget>().is64Bit()) {
				nosub8 = ratool.bv_from_regs({X86::RIP});
				} else {
				nosub8 = ratool.bv_from_regs(ArrayRef<MCPhysReg>(
				X86::GR32_ABCDRegClass.begin(), X86::GR32_ABCDRegClass.end()));
				nosub8.flip();
				}

				DEBUG(vrm.print(dbgs()));
				DEBUG(f.print(dbgs(), li.getSlotIndexes()));
				std::sort(constrained.begin(), constrained.end());
				std::for_each(constrained.begin(), constrained.end(), [&](Candidate &c) {
				DEBUG(dbgs() << c << "\n");
				c.unassign(ratool);
				bool demote = true;
				for (MCPhysReg preg : c.constraints) {
				if (!nosub8.test(preg) && c.valid_dest_reg(preg) &&
				ratool.reserve_phys_reg(preg, *c.extra)) {
				DEBUG(dbgs() << "works\n");
				c.assign_new(lrm, li, preg);
				return;
				}
				// only demote if RA pass missed all hints
				demote &= preg != c.pdest;
				}
				DEBUG(dbgs() << "could not transform\n");
				c.assign_old(lrm);
				if (demote) {
				c.constraints.clear();
				DEBUG(dbgs() << "demoting to unconstrained candidate\n");
				cands.push_back(std::move(c));
				} else {
				dispose.push_back(std::move(c));
				}
				});

				auto try_harder_to_alloc = [&](Candidate &c) {
				for (MCPhysReg newreg : X86::GR32_ABCDRegClass) {
				if (c.valid_dest_reg(newreg) && !ratool.unused_csr.test(newreg) &&
				ratool.reserve_phys_reg(newreg, *c.extra)) {
				return newreg;
				}
				}
				return static_cast<MCPhysReg>(0);
				};

				std::sort(cands.begin(), cands.end());
				for (Candidate &c : cands) {
				DEBUG(dbgs() << c << "\n");
				c.unassign(ratool);
				MCPhysReg newreg;
				if (!f.getSubtarget<X86Subtarget>().is64Bit() &&
				((newreg = ratool.alloc(*c.extra, &nosub8)) != 0 \|\|
				(newreg = try_harder_to_alloc(c)) != 0)) {
				DEBUG(dbgs() << "works\n");
				c.assign_new(lrm, li, newreg);
				} else if (f.getSubtarget<X86Subtarget>().is64Bit() &&
				(newreg = ratool.alloc(*c.extra, &nosub8)) != 0) {
				DEBUG(dbgs() << "works\n");
				c.assign_new(lrm, li, newreg);
				} else {
				DEBUG(dbgs() << "could not transform\n");
				c.assign_old(lrm);
				dispose.push_back(std::move(c));
				}
				}

				for (Candidate &c : dispose) {
				DEBUG(dbgs() << "purging dummy instr: " << (*c.ins));
				li.RemoveMachineInstrFromMaps(*c.ins);
				c.ins->eraseFromParent();
				}
				return false;
				}
				};

				char X86FixupZExt::id = 0;
				}

				namespace llvm {
				FunctionPass *createX86FixupZExt() { return new X86FixupZExt(); }
				}

lib/Target/X86/X86TargetMachine.cpp

//===-- X86TargetMachine.cpp - Define TargetMachine for the X86 -----------===//		//===-- X86TargetMachine.cpp - Define TargetMachine for the X86 -----------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file defines the X86 specific subclass of TargetMachine.		// This file defines the X86 specific subclass of TargetMachine.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "X86TargetMachine.h"
#include "X86.h"		#include "X86.h"
		#include "X86TargetMachine.h"
#include "X86TargetObjectFile.h"		#include "X86TargetObjectFile.h"
#include "X86TargetTransformInfo.h"		#include "X86TargetTransformInfo.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/FormattedStream.h"		#include "llvm/Support/FormattedStream.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
using namespace llvm;		using namespace llvm;

static cl::opt<bool> EnableMachineCombinerPass("x86-machine-combiner",		static cl::opt<bool> EnableMachineCombinerPass("x86-machine-combiner",
cl::desc("Enable the machine combiner pass"),		cl::desc("Enable the machine combiner pass"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

		static cl::opt<bool> EnableSetCCFixup("setcc-fixup",
		cl::desc("Apply X86FixupSetCC"),
		cl::init(false), cl::Hidden);

namespace llvm {		namespace llvm {
void initializeWinEHStatePassPass(PassRegistry &);		void initializeWinEHStatePassPass(PassRegistry &);
}		}

extern "C" void LLVMInitializeX86Target() {		extern "C" void LLVMInitializeX86Target() {
// Register the target.		// Register the target.
RegisterTargetMachine<X86TargetMachine> X(TheX86_32Target);		RegisterTargetMachine<X86TargetMachine> X(TheX86_32Target);
RegisterTargetMachine<X86TargetMachine> Y(TheX86_64Target);		RegisterTargetMachine<X86TargetMachine> Y(TheX86_64Target);
▲ Show 20 Lines • Show All 193 Lines • ▼ Show 20 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

TargetIRAnalysis X86TargetMachine::getTargetIRAnalysis() {		TargetIRAnalysis X86TargetMachine::getTargetIRAnalysis() {
return TargetIRAnalysis([this](const Function &F) {		return TargetIRAnalysis([this](const Function &F) {
return TargetTransformInfo(X86TTIImpl(this, F));		return TargetTransformInfo(X86TTIImpl(this, F));
});		});
}		}


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Pass Pipeline Configuration		// Pass Pipeline Configuration
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {
/// X86 Code Generator Pass Configuration Options.		/// X86 Code Generator Pass Configuration Options.
class X86PassConfig : public TargetPassConfig {		class X86PassConfig : public TargetPassConfig {
public:		public:
X86PassConfig(X86TargetMachine *TM, PassManagerBase &PM)		X86PassConfig(X86TargetMachine *TM, PassManagerBase &PM)
: TargetPassConfig(TM, PM) {}		: TargetPassConfig(TM, PM) {}

X86TargetMachine &getX86TargetMachine() const {		X86TargetMachine &getX86TargetMachine() const {
return getTM<X86TargetMachine>();		return getTM<X86TargetMachine>();
}		}

void addIRPasses() override;		void addIRPasses() override;
bool addInstSelector() override;		bool addInstSelector() override;
bool addILPOpts() override;		bool addILPOpts() override;
bool addPreISel() override;		bool addPreISel() override;
void addPreRegAlloc() override;		void addPreRegAlloc() override;
void addPostRegAlloc() override;		void addPostRegAlloc() override;
		bool addPreRewrite() override;
void addPreEmitPass() override;		void addPreEmitPass() override;
void addPreSched2() override;		void addPreSched2() override;
};		};
} // namespace		} // namespace

TargetPassConfig *X86TargetMachine::createPassConfig(PassManagerBase &PM) {		TargetPassConfig *X86TargetMachine::createPassConfig(PassManagerBase &PM) {
return new X86PassConfig(this, PM);		return new X86PassConfig(this, PM);
}		}
Show All 29 Lines	bool X86PassConfig::addPreISel() {
const Triple &TT = TM->getTargetTriple();		const Triple &TT = TM->getTargetTriple();
if (TT.isOSWindows() && TT.getArch() == Triple::x86)		if (TT.isOSWindows() && TT.getArch() == Triple::x86)
addPass(createX86WinEHStatePass());		addPass(createX86WinEHStatePass());
return true;		return true;
}		}

void X86PassConfig::addPreRegAlloc() {		void X86PassConfig::addPreRegAlloc() {
if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
		if (EnableSetCCFixup) {
addPass(createX86FixupSetCC());		addPass(createX86FixupSetCC());
		}
addPass(createX86OptimizeLEAs());		addPass(createX86OptimizeLEAs());
}		}

addPass(createX86CallFrameOptimization());		addPass(createX86CallFrameOptimization());
addPass(createX86WinAllocaExpander());		addPass(createX86WinAllocaExpander());
}		}

void X86PassConfig::addPostRegAlloc() {		void X86PassConfig::addPostRegAlloc() {
addPass(createX86FloatingPointStackifierPass());		addPass(createX86FloatingPointStackifierPass());
}		}

		bool X86PassConfig::addPreRewrite() {
		if (!EnableSetCCFixup) {
		addPass(createX86FixupZExt());
		}
		return false;
		}

void X86PassConfig::addPreSched2() { addPass(createX86ExpandPseudoPass()); }		void X86PassConfig::addPreSched2() { addPass(createX86ExpandPseudoPass()); }

void X86PassConfig::addPreEmitPass() {		void X86PassConfig::addPreEmitPass() {
if (getOptLevel() != CodeGenOpt::None)		if (getOptLevel() != CodeGenOpt::None)
addPass(createExecutionDependencyFixPass(&X86::VR128RegClass));		addPass(createExecutionDependencyFixPass(&X86::VR128RegClass));

if (UseVZeroUpper)		if (UseVZeroUpper)
addPass(createX86IssueVZeroUpperPass());		addPass(createX86IssueVZeroUpperPass());

if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
addPass(createX86FixupBWInsts());		addPass(createX86FixupBWInsts());
addPass(createX86PadShortFunctions());		addPass(createX86PadShortFunctions());
addPass(createX86FixupLEAs());		addPass(createX86FixupLEAs());
}		}
}		}

test/CodeGen/X86/avx-intrinsics-x86.ll

Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	; AVX512VL-NEXT: retl
ret <2 x double> %res		ret <2 x double> %res
}		}
declare <2 x double> @llvm.x86.sse2.cmp.sd(<2 x double>, <2 x double>, i8) nounwind readnone		declare <2 x double> @llvm.x86.sse2.cmp.sd(<2 x double>, <2 x double>, i8) nounwind readnone


define i32 @test_x86_sse2_comieq_sd(<2 x double> %a0, <2 x double> %a1) {		define i32 @test_x86_sse2_comieq_sd(<2 x double> %a0, <2 x double> %a1) {
; AVX-LABEL: test_x86_sse2_comieq_sd:		; AVX-LABEL: test_x86_sse2_comieq_sd:
; AVX: ## BB#0:		; AVX: ## BB#0:
		; AVX-NEXT: xorl %eax, %eax
; AVX-NEXT: vcomisd %xmm1, %xmm0		; AVX-NEXT: vcomisd %xmm1, %xmm0
; AVX-NEXT: setnp %al		; AVX-NEXT: setnp %cl
; AVX-NEXT: sete %cl		; AVX-NEXT: sete %al
; AVX-NEXT: andb %al, %cl		; AVX-NEXT: andb %cl, %al
; AVX-NEXT: movzbl %cl, %eax
; AVX-NEXT: retl		; AVX-NEXT: retl
;		;
; AVX512VL-LABEL: test_x86_sse2_comieq_sd:		; AVX512VL-LABEL: test_x86_sse2_comieq_sd:
; AVX512VL: ## BB#0:		; AVX512VL: ## BB#0:
		; AVX512VL-NEXT: xorl %eax, %eax
; AVX512VL-NEXT: vcomisd %xmm1, %xmm0		; AVX512VL-NEXT: vcomisd %xmm1, %xmm0
; AVX512VL-NEXT: setnp %al		; AVX512VL-NEXT: setnp %cl
; AVX512VL-NEXT: sete %cl		; AVX512VL-NEXT: sete %al
; AVX512VL-NEXT: andb %al, %cl		; AVX512VL-NEXT: andb %cl, %al
; AVX512VL-NEXT: movzbl %cl, %eax
; AVX512VL-NEXT: retl		; AVX512VL-NEXT: retl
%res = call i32 @llvm.x86.sse2.comieq.sd(<2 x double> %a0, <2 x double> %a1) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse2.comieq.sd(<2 x double> %a0, <2 x double> %a1) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse2.comieq.sd(<2 x double>, <2 x double>) nounwind readnone		declare i32 @llvm.x86.sse2.comieq.sd(<2 x double>, <2 x double>) nounwind readnone


define i32 @test_x86_sse2_comige_sd(<2 x double> %a0, <2 x double> %a1) {		define i32 @test_x86_sse2_comige_sd(<2 x double> %a0, <2 x double> %a1) {
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	; AVX512VL-NEXT: retl
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse2.comilt.sd(<2 x double>, <2 x double>) nounwind readnone		declare i32 @llvm.x86.sse2.comilt.sd(<2 x double>, <2 x double>) nounwind readnone


define i32 @test_x86_sse2_comineq_sd(<2 x double> %a0, <2 x double> %a1) {		define i32 @test_x86_sse2_comineq_sd(<2 x double> %a0, <2 x double> %a1) {
; AVX-LABEL: test_x86_sse2_comineq_sd:		; AVX-LABEL: test_x86_sse2_comineq_sd:
; AVX: ## BB#0:		; AVX: ## BB#0:
		; AVX-NEXT: xorl %eax, %eax
; AVX-NEXT: vcomisd %xmm1, %xmm0		; AVX-NEXT: vcomisd %xmm1, %xmm0
; AVX-NEXT: setp %al		; AVX-NEXT: setp %cl
; AVX-NEXT: setne %cl		; AVX-NEXT: setne %al
; AVX-NEXT: orb %al, %cl		; AVX-NEXT: orb %cl, %al
; AVX-NEXT: movzbl %cl, %eax
; AVX-NEXT: retl		; AVX-NEXT: retl
;		;
; AVX512VL-LABEL: test_x86_sse2_comineq_sd:		; AVX512VL-LABEL: test_x86_sse2_comineq_sd:
; AVX512VL: ## BB#0:		; AVX512VL: ## BB#0:
		; AVX512VL-NEXT: xorl %eax, %eax
; AVX512VL-NEXT: vcomisd %xmm1, %xmm0		; AVX512VL-NEXT: vcomisd %xmm1, %xmm0
; AVX512VL-NEXT: setp %al		; AVX512VL-NEXT: setp %cl
; AVX512VL-NEXT: setne %cl		; AVX512VL-NEXT: setne %al
; AVX512VL-NEXT: orb %al, %cl		; AVX512VL-NEXT: orb %cl, %al
; AVX512VL-NEXT: movzbl %cl, %eax
; AVX512VL-NEXT: retl		; AVX512VL-NEXT: retl
%res = call i32 @llvm.x86.sse2.comineq.sd(<2 x double> %a0, <2 x double> %a1) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse2.comineq.sd(<2 x double> %a0, <2 x double> %a1) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse2.comineq.sd(<2 x double>, <2 x double>) nounwind readnone		declare i32 @llvm.x86.sse2.comineq.sd(<2 x double>, <2 x double>) nounwind readnone


define <4 x float> @test_x86_sse2_cvtdq2ps(<4 x i32> %a0) {		define <4 x float> @test_x86_sse2_cvtdq2ps(<4 x i32> %a0) {
▲ Show 20 Lines • Show All 957 Lines • ▼ Show 20 Lines	; AVX512VL-NEXT: retl
ret <2 x double> %res		ret <2 x double> %res
}		}
declare <2 x double> @llvm.x86.sse2.sub.sd(<2 x double>, <2 x double>) nounwind readnone		declare <2 x double> @llvm.x86.sse2.sub.sd(<2 x double>, <2 x double>) nounwind readnone


define i32 @test_x86_sse2_ucomieq_sd(<2 x double> %a0, <2 x double> %a1) {		define i32 @test_x86_sse2_ucomieq_sd(<2 x double> %a0, <2 x double> %a1) {
; AVX-LABEL: test_x86_sse2_ucomieq_sd:		; AVX-LABEL: test_x86_sse2_ucomieq_sd:
; AVX: ## BB#0:		; AVX: ## BB#0:
		; AVX-NEXT: xorl %eax, %eax
; AVX-NEXT: vucomisd %xmm1, %xmm0		; AVX-NEXT: vucomisd %xmm1, %xmm0
; AVX-NEXT: setnp %al		; AVX-NEXT: setnp %cl
; AVX-NEXT: sete %cl		; AVX-NEXT: sete %al
; AVX-NEXT: andb %al, %cl		; AVX-NEXT: andb %cl, %al
; AVX-NEXT: movzbl %cl, %eax
; AVX-NEXT: retl		; AVX-NEXT: retl
;		;
; AVX512VL-LABEL: test_x86_sse2_ucomieq_sd:		; AVX512VL-LABEL: test_x86_sse2_ucomieq_sd:
; AVX512VL: ## BB#0:		; AVX512VL: ## BB#0:
		; AVX512VL-NEXT: xorl %eax, %eax
; AVX512VL-NEXT: vucomisd %xmm1, %xmm0		; AVX512VL-NEXT: vucomisd %xmm1, %xmm0
; AVX512VL-NEXT: setnp %al		; AVX512VL-NEXT: setnp %cl
; AVX512VL-NEXT: sete %cl		; AVX512VL-NEXT: sete %al
; AVX512VL-NEXT: andb %al, %cl		; AVX512VL-NEXT: andb %cl, %al
; AVX512VL-NEXT: movzbl %cl, %eax
; AVX512VL-NEXT: retl		; AVX512VL-NEXT: retl
%res = call i32 @llvm.x86.sse2.ucomieq.sd(<2 x double> %a0, <2 x double> %a1) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse2.ucomieq.sd(<2 x double> %a0, <2 x double> %a1) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse2.ucomieq.sd(<2 x double>, <2 x double>) nounwind readnone		declare i32 @llvm.x86.sse2.ucomieq.sd(<2 x double>, <2 x double>) nounwind readnone


define i32 @test_x86_sse2_ucomige_sd(<2 x double> %a0, <2 x double> %a1) {		define i32 @test_x86_sse2_ucomige_sd(<2 x double> %a0, <2 x double> %a1) {
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	; AVX512VL-NEXT: retl
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse2.ucomilt.sd(<2 x double>, <2 x double>) nounwind readnone		declare i32 @llvm.x86.sse2.ucomilt.sd(<2 x double>, <2 x double>) nounwind readnone


define i32 @test_x86_sse2_ucomineq_sd(<2 x double> %a0, <2 x double> %a1) {		define i32 @test_x86_sse2_ucomineq_sd(<2 x double> %a0, <2 x double> %a1) {
; AVX-LABEL: test_x86_sse2_ucomineq_sd:		; AVX-LABEL: test_x86_sse2_ucomineq_sd:
; AVX: ## BB#0:		; AVX: ## BB#0:
		; AVX-NEXT: xorl %eax, %eax
; AVX-NEXT: vucomisd %xmm1, %xmm0		; AVX-NEXT: vucomisd %xmm1, %xmm0
; AVX-NEXT: setp %al		; AVX-NEXT: setp %cl
; AVX-NEXT: setne %cl		; AVX-NEXT: setne %al
; AVX-NEXT: orb %al, %cl		; AVX-NEXT: orb %cl, %al
; AVX-NEXT: movzbl %cl, %eax
; AVX-NEXT: retl		; AVX-NEXT: retl
;		;
; AVX512VL-LABEL: test_x86_sse2_ucomineq_sd:		; AVX512VL-LABEL: test_x86_sse2_ucomineq_sd:
; AVX512VL: ## BB#0:		; AVX512VL: ## BB#0:
		; AVX512VL-NEXT: xorl %eax, %eax
; AVX512VL-NEXT: vucomisd %xmm1, %xmm0		; AVX512VL-NEXT: vucomisd %xmm1, %xmm0
; AVX512VL-NEXT: setp %al		; AVX512VL-NEXT: setp %cl
; AVX512VL-NEXT: setne %cl		; AVX512VL-NEXT: setne %al
; AVX512VL-NEXT: orb %al, %cl		; AVX512VL-NEXT: orb %cl, %al
; AVX512VL-NEXT: movzbl %cl, %eax
; AVX512VL-NEXT: retl		; AVX512VL-NEXT: retl
%res = call i32 @llvm.x86.sse2.ucomineq.sd(<2 x double> %a0, <2 x double> %a1) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse2.ucomineq.sd(<2 x double> %a0, <2 x double> %a1) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse2.ucomineq.sd(<2 x double>, <2 x double>) nounwind readnone		declare i32 @llvm.x86.sse2.ucomineq.sd(<2 x double>, <2 x double>) nounwind readnone


define <2 x double> @test_x86_sse3_addsub_pd(<2 x double> %a0, <2 x double> %a1) {		define <2 x double> @test_x86_sse3_addsub_pd(<2 x double> %a0, <2 x double> %a1) {
▲ Show 20 Lines • Show All 569 Lines • ▼ Show 20 Lines
; AVX512VL-NEXT: retl		; AVX512VL-NEXT: retl
%1 = load <16 x i8>, <16 x i8>* %a0		%1 = load <16 x i8>, <16 x i8>* %a0
%2 = load <16 x i8>, <16 x i8>* %a2		%2 = load <16 x i8>, <16 x i8>* %a2
%res = call i32 @llvm.x86.sse42.pcmpestri128(<16 x i8> %1, i32 7, <16 x i8> %2, i32 7, i8 7) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse42.pcmpestri128(<16 x i8> %1, i32 7, <16 x i8> %2, i32 7, i8 7) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}


define i32 @test_x86_sse42_pcmpestria128(<16 x i8> %a0, <16 x i8> %a2) nounwind {		define i32 @test_x86_sse42_pcmpestria128(<16 x i8> %a0, <16 x i8> %a2) {
; AVX-LABEL: test_x86_sse42_pcmpestria128:		; AVX-LABEL: test_x86_sse42_pcmpestria128:
; AVX: ## BB#0:		; AVX: ## BB#0:
; AVX-NEXT: pushl %ebx
; AVX-NEXT: movl $7, %eax		; AVX-NEXT: movl $7, %eax
; AVX-NEXT: movl $7, %edx		; AVX-NEXT: movl $7, %edx
; AVX-NEXT: xorl %ebx, %ebx
; AVX-NEXT: vpcmpestri $7, %xmm1, %xmm0		; AVX-NEXT: vpcmpestri $7, %xmm1, %xmm0
; AVX-NEXT: seta %bl		; AVX-NEXT: seta %al
; AVX-NEXT: movl %ebx, %eax		; AVX-NEXT: movzbl %al, %eax
; AVX-NEXT: popl %ebx
; AVX-NEXT: retl		; AVX-NEXT: retl
;		;
; AVX512VL-LABEL: test_x86_sse42_pcmpestria128:		; AVX512VL-LABEL: test_x86_sse42_pcmpestria128:
; AVX512VL: ## BB#0:		; AVX512VL: ## BB#0:
; AVX512VL-NEXT: pushl %ebx
; AVX512VL-NEXT: movl $7, %eax		; AVX512VL-NEXT: movl $7, %eax
; AVX512VL-NEXT: movl $7, %edx		; AVX512VL-NEXT: movl $7, %edx
; AVX512VL-NEXT: xorl %ebx, %ebx
; AVX512VL-NEXT: vpcmpestri $7, %xmm1, %xmm0		; AVX512VL-NEXT: vpcmpestri $7, %xmm1, %xmm0
; AVX512VL-NEXT: seta %bl		; AVX512VL-NEXT: seta %al
; AVX512VL-NEXT: movl %ebx, %eax		; AVX512VL-NEXT: movzbl %al, %eax
; AVX512VL-NEXT: popl %ebx
; AVX512VL-NEXT: retl		; AVX512VL-NEXT: retl
%res = call i32 @llvm.x86.sse42.pcmpestria128(<16 x i8> %a0, i32 7, <16 x i8> %a2, i32 7, i8 7) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse42.pcmpestria128(<16 x i8> %a0, i32 7, <16 x i8> %a2, i32 7, i8 7) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse42.pcmpestria128(<16 x i8>, i32, <16 x i8>, i32, i8) nounwind readnone		declare i32 @llvm.x86.sse42.pcmpestria128(<16 x i8>, i32, <16 x i8>, i32, i8) nounwind readnone


define i32 @test_x86_sse42_pcmpestric128(<16 x i8> %a0, <16 x i8> %a2) {		define i32 @test_x86_sse42_pcmpestric128(<16 x i8> %a0, <16 x i8> %a2) {
Show All 15 Lines
; AVX512VL-NEXT: andl $1, %eax		; AVX512VL-NEXT: andl $1, %eax
; AVX512VL-NEXT: retl		; AVX512VL-NEXT: retl
%res = call i32 @llvm.x86.sse42.pcmpestric128(<16 x i8> %a0, i32 7, <16 x i8> %a2, i32 7, i8 7) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse42.pcmpestric128(<16 x i8> %a0, i32 7, <16 x i8> %a2, i32 7, i8 7) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse42.pcmpestric128(<16 x i8>, i32, <16 x i8>, i32, i8) nounwind readnone		declare i32 @llvm.x86.sse42.pcmpestric128(<16 x i8>, i32, <16 x i8>, i32, i8) nounwind readnone


define i32 @test_x86_sse42_pcmpestrio128(<16 x i8> %a0, <16 x i8> %a2) nounwind {		define i32 @test_x86_sse42_pcmpestrio128(<16 x i8> %a0, <16 x i8> %a2) {
; AVX-LABEL: test_x86_sse42_pcmpestrio128:		; AVX-LABEL: test_x86_sse42_pcmpestrio128:
; AVX: ## BB#0:		; AVX: ## BB#0:
; AVX-NEXT: pushl %ebx
; AVX-NEXT: movl $7, %eax		; AVX-NEXT: movl $7, %eax
; AVX-NEXT: movl $7, %edx		; AVX-NEXT: movl $7, %edx
; AVX-NEXT: xorl %ebx, %ebx
; AVX-NEXT: vpcmpestri $7, %xmm1, %xmm0		; AVX-NEXT: vpcmpestri $7, %xmm1, %xmm0
; AVX-NEXT: seto %bl		; AVX-NEXT: seto %al
; AVX-NEXT: movl %ebx, %eax		; AVX-NEXT: movzbl %al, %eax
; AVX-NEXT: popl %ebx
; AVX-NEXT: retl		; AVX-NEXT: retl
;		;
; AVX512VL-LABEL: test_x86_sse42_pcmpestrio128:		; AVX512VL-LABEL: test_x86_sse42_pcmpestrio128:
; AVX512VL: ## BB#0:		; AVX512VL: ## BB#0:
; AVX512VL-NEXT: pushl %ebx
; AVX512VL-NEXT: movl $7, %eax		; AVX512VL-NEXT: movl $7, %eax
; AVX512VL-NEXT: movl $7, %edx		; AVX512VL-NEXT: movl $7, %edx
; AVX512VL-NEXT: xorl %ebx, %ebx
; AVX512VL-NEXT: vpcmpestri $7, %xmm1, %xmm0		; AVX512VL-NEXT: vpcmpestri $7, %xmm1, %xmm0
; AVX512VL-NEXT: seto %bl		; AVX512VL-NEXT: seto %al
; AVX512VL-NEXT: movl %ebx, %eax		; AVX512VL-NEXT: movzbl %al, %eax
; AVX512VL-NEXT: popl %ebx
; AVX512VL-NEXT: retl		; AVX512VL-NEXT: retl
%res = call i32 @llvm.x86.sse42.pcmpestrio128(<16 x i8> %a0, i32 7, <16 x i8> %a2, i32 7, i8 7) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse42.pcmpestrio128(<16 x i8> %a0, i32 7, <16 x i8> %a2, i32 7, i8 7) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse42.pcmpestrio128(<16 x i8>, i32, <16 x i8>, i32, i8) nounwind readnone		declare i32 @llvm.x86.sse42.pcmpestrio128(<16 x i8>, i32, <16 x i8>, i32, i8) nounwind readnone


define i32 @test_x86_sse42_pcmpestris128(<16 x i8> %a0, <16 x i8> %a2) nounwind {		define i32 @test_x86_sse42_pcmpestris128(<16 x i8> %a0, <16 x i8> %a2) {
; AVX-LABEL: test_x86_sse42_pcmpestris128:		; AVX-LABEL: test_x86_sse42_pcmpestris128:
; AVX: ## BB#0:		; AVX: ## BB#0:
; AVX-NEXT: pushl %ebx
; AVX-NEXT: movl $7, %eax		; AVX-NEXT: movl $7, %eax
; AVX-NEXT: movl $7, %edx		; AVX-NEXT: movl $7, %edx
; AVX-NEXT: xorl %ebx, %ebx
; AVX-NEXT: vpcmpestri $7, %xmm1, %xmm0		; AVX-NEXT: vpcmpestri $7, %xmm1, %xmm0
; AVX-NEXT: sets %bl		; AVX-NEXT: sets %al
; AVX-NEXT: movl %ebx, %eax		; AVX-NEXT: movzbl %al, %eax
; AVX-NEXT: popl %ebx
; AVX-NEXT: retl		; AVX-NEXT: retl
;		;
; AVX512VL-LABEL: test_x86_sse42_pcmpestris128:		; AVX512VL-LABEL: test_x86_sse42_pcmpestris128:
; AVX512VL: ## BB#0:		; AVX512VL: ## BB#0:
; AVX512VL-NEXT: pushl %ebx
; AVX512VL-NEXT: movl $7, %eax		; AVX512VL-NEXT: movl $7, %eax
; AVX512VL-NEXT: movl $7, %edx		; AVX512VL-NEXT: movl $7, %edx
; AVX512VL-NEXT: xorl %ebx, %ebx
; AVX512VL-NEXT: vpcmpestri $7, %xmm1, %xmm0		; AVX512VL-NEXT: vpcmpestri $7, %xmm1, %xmm0
; AVX512VL-NEXT: sets %bl		; AVX512VL-NEXT: sets %al
; AVX512VL-NEXT: movl %ebx, %eax		; AVX512VL-NEXT: movzbl %al, %eax
; AVX512VL-NEXT: popl %ebx
; AVX512VL-NEXT: retl		; AVX512VL-NEXT: retl
%res = call i32 @llvm.x86.sse42.pcmpestris128(<16 x i8> %a0, i32 7, <16 x i8> %a2, i32 7, i8 7) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse42.pcmpestris128(<16 x i8> %a0, i32 7, <16 x i8> %a2, i32 7, i8 7) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse42.pcmpestris128(<16 x i8>, i32, <16 x i8>, i32, i8) nounwind readnone		declare i32 @llvm.x86.sse42.pcmpestris128(<16 x i8>, i32, <16 x i8>, i32, i8) nounwind readnone


define i32 @test_x86_sse42_pcmpestriz128(<16 x i8> %a0, <16 x i8> %a2) nounwind {		define i32 @test_x86_sse42_pcmpestriz128(<16 x i8> %a0, <16 x i8> %a2) {
; AVX-LABEL: test_x86_sse42_pcmpestriz128:		; AVX-LABEL: test_x86_sse42_pcmpestriz128:
; AVX: ## BB#0:		; AVX: ## BB#0:
; AVX-NEXT: pushl %ebx
; AVX-NEXT: movl $7, %eax		; AVX-NEXT: movl $7, %eax
; AVX-NEXT: movl $7, %edx		; AVX-NEXT: movl $7, %edx
; AVX-NEXT: xorl %ebx, %ebx
; AVX-NEXT: vpcmpestri $7, %xmm1, %xmm0		; AVX-NEXT: vpcmpestri $7, %xmm1, %xmm0
; AVX-NEXT: sete %bl		; AVX-NEXT: sete %al
; AVX-NEXT: movl %ebx, %eax		; AVX-NEXT: movzbl %al, %eax
; AVX-NEXT: popl %ebx
; AVX-NEXT: retl		; AVX-NEXT: retl
;		;
; AVX512VL-LABEL: test_x86_sse42_pcmpestriz128:		; AVX512VL-LABEL: test_x86_sse42_pcmpestriz128:
; AVX512VL: ## BB#0:		; AVX512VL: ## BB#0:
; AVX512VL-NEXT: pushl %ebx
; AVX512VL-NEXT: movl $7, %eax		; AVX512VL-NEXT: movl $7, %eax
; AVX512VL-NEXT: movl $7, %edx		; AVX512VL-NEXT: movl $7, %edx
; AVX512VL-NEXT: xorl %ebx, %ebx
; AVX512VL-NEXT: vpcmpestri $7, %xmm1, %xmm0		; AVX512VL-NEXT: vpcmpestri $7, %xmm1, %xmm0
; AVX512VL-NEXT: sete %bl		; AVX512VL-NEXT: sete %al
; AVX512VL-NEXT: movl %ebx, %eax		; AVX512VL-NEXT: movzbl %al, %eax
; AVX512VL-NEXT: popl %ebx
; AVX512VL-NEXT: retl		; AVX512VL-NEXT: retl
%res = call i32 @llvm.x86.sse42.pcmpestriz128(<16 x i8> %a0, i32 7, <16 x i8> %a2, i32 7, i8 7) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse42.pcmpestriz128(<16 x i8> %a0, i32 7, <16 x i8> %a2, i32 7, i8 7) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse42.pcmpestriz128(<16 x i8>, i32, <16 x i8>, i32, i8) nounwind readnone		declare i32 @llvm.x86.sse42.pcmpestriz128(<16 x i8>, i32, <16 x i8>, i32, i8) nounwind readnone


define <16 x i8> @test_x86_sse42_pcmpestrm128(<16 x i8> %a0, <16 x i8> %a2) {		define <16 x i8> @test_x86_sse42_pcmpestrm128(<16 x i8> %a0, <16 x i8> %a2) {
▲ Show 20 Lines • Show All 261 Lines • ▼ Show 20 Lines	; AVX512VL-NEXT: retl
ret <4 x float> %res		ret <4 x float> %res
}		}
declare <4 x float> @llvm.x86.sse.cmp.ss(<4 x float>, <4 x float>, i8) nounwind readnone		declare <4 x float> @llvm.x86.sse.cmp.ss(<4 x float>, <4 x float>, i8) nounwind readnone


define i32 @test_x86_sse_comieq_ss(<4 x float> %a0, <4 x float> %a1) {		define i32 @test_x86_sse_comieq_ss(<4 x float> %a0, <4 x float> %a1) {
; AVX-LABEL: test_x86_sse_comieq_ss:		; AVX-LABEL: test_x86_sse_comieq_ss:
; AVX: ## BB#0:		; AVX: ## BB#0:
		; AVX-NEXT: xorl %eax, %eax
; AVX-NEXT: vcomiss %xmm1, %xmm0		; AVX-NEXT: vcomiss %xmm1, %xmm0
; AVX-NEXT: setnp %al		; AVX-NEXT: setnp %cl
; AVX-NEXT: sete %cl		; AVX-NEXT: sete %al
; AVX-NEXT: andb %al, %cl		; AVX-NEXT: andb %cl, %al
; AVX-NEXT: movzbl %cl, %eax
; AVX-NEXT: retl		; AVX-NEXT: retl
;		;
; AVX512VL-LABEL: test_x86_sse_comieq_ss:		; AVX512VL-LABEL: test_x86_sse_comieq_ss:
; AVX512VL: ## BB#0:		; AVX512VL: ## BB#0:
		; AVX512VL-NEXT: xorl %eax, %eax
; AVX512VL-NEXT: vcomiss %xmm1, %xmm0		; AVX512VL-NEXT: vcomiss %xmm1, %xmm0
; AVX512VL-NEXT: setnp %al		; AVX512VL-NEXT: setnp %cl
; AVX512VL-NEXT: sete %cl		; AVX512VL-NEXT: sete %al
; AVX512VL-NEXT: andb %al, %cl		; AVX512VL-NEXT: andb %cl, %al
; AVX512VL-NEXT: movzbl %cl, %eax
; AVX512VL-NEXT: retl		; AVX512VL-NEXT: retl
%res = call i32 @llvm.x86.sse.comieq.ss(<4 x float> %a0, <4 x float> %a1) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse.comieq.ss(<4 x float> %a0, <4 x float> %a1) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.comieq.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.comieq.ss(<4 x float>, <4 x float>) nounwind readnone


define i32 @test_x86_sse_comige_ss(<4 x float> %a0, <4 x float> %a1) {		define i32 @test_x86_sse_comige_ss(<4 x float> %a0, <4 x float> %a1) {
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	; AVX512VL-NEXT: retl
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.comilt.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.comilt.ss(<4 x float>, <4 x float>) nounwind readnone


define i32 @test_x86_sse_comineq_ss(<4 x float> %a0, <4 x float> %a1) {		define i32 @test_x86_sse_comineq_ss(<4 x float> %a0, <4 x float> %a1) {
; AVX-LABEL: test_x86_sse_comineq_ss:		; AVX-LABEL: test_x86_sse_comineq_ss:
; AVX: ## BB#0:		; AVX: ## BB#0:
		; AVX-NEXT: xorl %eax, %eax
; AVX-NEXT: vcomiss %xmm1, %xmm0		; AVX-NEXT: vcomiss %xmm1, %xmm0
; AVX-NEXT: setp %al		; AVX-NEXT: setp %cl
; AVX-NEXT: setne %cl		; AVX-NEXT: setne %al
; AVX-NEXT: orb %al, %cl		; AVX-NEXT: orb %cl, %al
; AVX-NEXT: movzbl %cl, %eax
; AVX-NEXT: retl		; AVX-NEXT: retl
;		;
; AVX512VL-LABEL: test_x86_sse_comineq_ss:		; AVX512VL-LABEL: test_x86_sse_comineq_ss:
; AVX512VL: ## BB#0:		; AVX512VL: ## BB#0:
		; AVX512VL-NEXT: xorl %eax, %eax
; AVX512VL-NEXT: vcomiss %xmm1, %xmm0		; AVX512VL-NEXT: vcomiss %xmm1, %xmm0
; AVX512VL-NEXT: setp %al		; AVX512VL-NEXT: setp %cl
; AVX512VL-NEXT: setne %cl		; AVX512VL-NEXT: setne %al
; AVX512VL-NEXT: orb %al, %cl		; AVX512VL-NEXT: orb %cl, %al
; AVX512VL-NEXT: movzbl %cl, %eax
; AVX512VL-NEXT: retl		; AVX512VL-NEXT: retl
%res = call i32 @llvm.x86.sse.comineq.ss(<4 x float> %a0, <4 x float> %a1) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse.comineq.ss(<4 x float> %a0, <4 x float> %a1) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.comineq.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.comineq.ss(<4 x float>, <4 x float>) nounwind readnone


define <4 x float> @test_x86_sse_cvtsi2ss(<4 x float> %a0) {		define <4 x float> @test_x86_sse_cvtsi2ss(<4 x float> %a0) {
▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	; AVX512VL-NEXT: retl
ret <4 x float> %res		ret <4 x float> %res
}		}
declare <4 x float> @llvm.x86.sse.sub.ss(<4 x float>, <4 x float>) nounwind readnone		declare <4 x float> @llvm.x86.sse.sub.ss(<4 x float>, <4 x float>) nounwind readnone


define i32 @test_x86_sse_ucomieq_ss(<4 x float> %a0, <4 x float> %a1) {		define i32 @test_x86_sse_ucomieq_ss(<4 x float> %a0, <4 x float> %a1) {
; AVX-LABEL: test_x86_sse_ucomieq_ss:		; AVX-LABEL: test_x86_sse_ucomieq_ss:
; AVX: ## BB#0:		; AVX: ## BB#0:
		; AVX-NEXT: xorl %eax, %eax
; AVX-NEXT: vucomiss %xmm1, %xmm0		; AVX-NEXT: vucomiss %xmm1, %xmm0
; AVX-NEXT: setnp %al		; AVX-NEXT: setnp %cl
; AVX-NEXT: sete %cl		; AVX-NEXT: sete %al
; AVX-NEXT: andb %al, %cl		; AVX-NEXT: andb %cl, %al
; AVX-NEXT: movzbl %cl, %eax
; AVX-NEXT: retl		; AVX-NEXT: retl
;		;
; AVX512VL-LABEL: test_x86_sse_ucomieq_ss:		; AVX512VL-LABEL: test_x86_sse_ucomieq_ss:
; AVX512VL: ## BB#0:		; AVX512VL: ## BB#0:
		; AVX512VL-NEXT: xorl %eax, %eax
; AVX512VL-NEXT: vucomiss %xmm1, %xmm0		; AVX512VL-NEXT: vucomiss %xmm1, %xmm0
; AVX512VL-NEXT: setnp %al		; AVX512VL-NEXT: setnp %cl
; AVX512VL-NEXT: sete %cl		; AVX512VL-NEXT: sete %al
; AVX512VL-NEXT: andb %al, %cl		; AVX512VL-NEXT: andb %cl, %al
; AVX512VL-NEXT: movzbl %cl, %eax
; AVX512VL-NEXT: retl		; AVX512VL-NEXT: retl
%res = call i32 @llvm.x86.sse.ucomieq.ss(<4 x float> %a0, <4 x float> %a1) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse.ucomieq.ss(<4 x float> %a0, <4 x float> %a1) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.ucomieq.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.ucomieq.ss(<4 x float>, <4 x float>) nounwind readnone


define i32 @test_x86_sse_ucomige_ss(<4 x float> %a0, <4 x float> %a1) {		define i32 @test_x86_sse_ucomige_ss(<4 x float> %a0, <4 x float> %a1) {
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	; AVX512VL-NEXT: retl
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.ucomilt.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.ucomilt.ss(<4 x float>, <4 x float>) nounwind readnone


define i32 @test_x86_sse_ucomineq_ss(<4 x float> %a0, <4 x float> %a1) {		define i32 @test_x86_sse_ucomineq_ss(<4 x float> %a0, <4 x float> %a1) {
; AVX-LABEL: test_x86_sse_ucomineq_ss:		; AVX-LABEL: test_x86_sse_ucomineq_ss:
; AVX: ## BB#0:		; AVX: ## BB#0:
		; AVX-NEXT: xorl %eax, %eax
; AVX-NEXT: vucomiss %xmm1, %xmm0		; AVX-NEXT: vucomiss %xmm1, %xmm0
; AVX-NEXT: setp %al		; AVX-NEXT: setp %cl
; AVX-NEXT: setne %cl		; AVX-NEXT: setne %al
; AVX-NEXT: orb %al, %cl		; AVX-NEXT: orb %cl, %al
; AVX-NEXT: movzbl %cl, %eax
; AVX-NEXT: retl		; AVX-NEXT: retl
;		;
; AVX512VL-LABEL: test_x86_sse_ucomineq_ss:		; AVX512VL-LABEL: test_x86_sse_ucomineq_ss:
; AVX512VL: ## BB#0:		; AVX512VL: ## BB#0:
		; AVX512VL-NEXT: xorl %eax, %eax
; AVX512VL-NEXT: vucomiss %xmm1, %xmm0		; AVX512VL-NEXT: vucomiss %xmm1, %xmm0
; AVX512VL-NEXT: setp %al		; AVX512VL-NEXT: setp %cl
; AVX512VL-NEXT: setne %cl		; AVX512VL-NEXT: setne %al
; AVX512VL-NEXT: orb %al, %cl		; AVX512VL-NEXT: orb %cl, %al
; AVX512VL-NEXT: movzbl %cl, %eax
; AVX512VL-NEXT: retl		; AVX512VL-NEXT: retl
%res = call i32 @llvm.x86.sse.ucomineq.ss(<4 x float> %a0, <4 x float> %a1) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse.ucomineq.ss(<4 x float> %a0, <4 x float> %a1) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.ucomineq.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.ucomineq.ss(<4 x float>, <4 x float>) nounwind readnone


define <16 x i8> @test_x86_ssse3_pabs_b_128(<16 x i8> %a0) {		define <16 x i8> @test_x86_ssse3_pabs_b_128(<16 x i8> %a0) {
▲ Show 20 Lines • Show All 1,705 Lines • Show Last 20 Lines

test/CodeGen/X86/avx512-cmp.ll

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	l2:
%c1 = fadd float %a, %b		%c1 = fadd float %a, %b
ret float %c1		ret float %c1
}		}

; FIXME: Can use vcmpeqss and extract from the mask here in AVX512.		; FIXME: Can use vcmpeqss and extract from the mask here in AVX512.
define i32 @test3(float %a, float %b) {		define i32 @test3(float %a, float %b) {
; ALL-LABEL: test3:		; ALL-LABEL: test3:
; ALL: ## BB#0:		; ALL: ## BB#0:
		; ALL-NEXT: xorl %eax, %eax
; ALL-NEXT: vucomiss %xmm1, %xmm0		; ALL-NEXT: vucomiss %xmm1, %xmm0
; ALL-NEXT: setnp %al		; ALL-NEXT: setnp %cl
; ALL-NEXT: sete %cl		; ALL-NEXT: sete %al
; ALL-NEXT: andb %al, %cl		; ALL-NEXT: andb %cl, %al
; ALL-NEXT: movzbl %cl, %eax
; ALL-NEXT: retq		; ALL-NEXT: retq

%cmp10.i = fcmp oeq float %a, %b		%cmp10.i = fcmp oeq float %a, %b
%conv11.i = zext i1 %cmp10.i to i32		%conv11.i = zext i1 %cmp10.i to i32
ret i32 %conv11.i		ret i32 %conv11.i
}		}

define float @test5(float %p) #0 {		define float @test5(float %p) #0 {
; ALL-LABEL: test5:		; ALL-LABEL: test5:
; ALL: ## BB#0: ## %entry		; ALL: ## BB#0: ## %entry
; ALL-NEXT: vxorps %xmm1, %xmm1, %xmm1		; ALL-NEXT: vxorps %xmm1, %xmm1, %xmm1
		; ALL-NEXT: xorl %eax, %eax
; ALL-NEXT: vucomiss %xmm1, %xmm0		; ALL-NEXT: vucomiss %xmm1, %xmm0
; ALL-NEXT: jne LBB3_1		; ALL-NEXT: jne LBB3_1
; ALL-NEXT: jnp LBB3_2		; ALL-NEXT: jnp LBB3_2
; ALL-NEXT: LBB3_1: ## %if.end		; ALL-NEXT: LBB3_1: ## %if.end
; ALL-NEXT: seta %al		; ALL-NEXT: seta %al
; ALL-NEXT: movzbl %al, %eax
; ALL-NEXT: leaq {{.*}}(%rip), %rcx		; ALL-NEXT: leaq {{.*}}(%rip), %rcx
; ALL-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero		; ALL-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
; ALL-NEXT: LBB3_2: ## %return		; ALL-NEXT: LBB3_2: ## %return
; ALL-NEXT: retq		; ALL-NEXT: retq
entry:		entry:
%cmp = fcmp oeq float %p, 0.000000e+00		%cmp = fcmp oeq float %p, 0.000000e+00
br i1 %cmp, label %return, label %if.end		br i1 %cmp, label %return, label %if.end

▲ Show 20 Lines • Show All 106 Lines • Show Last 20 Lines

test/CodeGen/X86/cmpxchg-i1.ll

Show All 28 Lines	false:
call void @bar()		call void @bar()
ret void		ret void
}		}

define i64 @cmpxchg_sext(i32* %addr, i32 %desired, i32 %new) {		define i64 @cmpxchg_sext(i32* %addr, i32 %desired, i32 %new) {
; CHECK-LABEL: cmpxchg_sext:		; CHECK-LABEL: cmpxchg_sext:
; CHECK-DAG: cmpxchgl		; CHECK-DAG: cmpxchgl
; CHECK-NOT: cmpl		; CHECK-NOT: cmpl
; CHECK: sete %cl		; CHECK: sete %al
; CHECK: retq		; CHECK: retq
%pair = cmpxchg i32* %addr, i32 %desired, i32 %new seq_cst seq_cst		%pair = cmpxchg i32* %addr, i32 %desired, i32 %new seq_cst seq_cst
%success = extractvalue { i32, i1 } %pair, 1		%success = extractvalue { i32, i1 } %pair, 1
%mask = sext i1 %success to i64		%mask = sext i1 %success to i64
ret i64 %mask		ret i64 %mask
}		}

define i32 @cmpxchg_zext(i32* %addr, i32 %desired, i32 %new) {		define i32 @cmpxchg_zext(i32* %addr, i32 %desired, i32 %new) {
; CHECK-LABEL: cmpxchg_zext:		; CHECK-LABEL: cmpxchg_zext:
; CHECK: xorl %e[[R:[a-z]]]x
; CHECK: cmpxchgl		; CHECK: cmpxchgl
; CHECK-NOT: cmp		; CHECK-NOT: cmp
; CHECK: sete %[[R]]l		; CHECK: sete [[BYTE:%[a-z0-9]+]]
		; CHECK: movzbl [[BYTE]], %eax
%pair = cmpxchg i32* %addr, i32 %desired, i32 %new seq_cst seq_cst		%pair = cmpxchg i32* %addr, i32 %desired, i32 %new seq_cst seq_cst
%success = extractvalue { i32, i1 } %pair, 1		%success = extractvalue { i32, i1 } %pair, 1
%mask = zext i1 %success to i32		%mask = zext i1 %success to i32
ret i32 %mask		ret i32 %mask
}		}


define i32 @cmpxchg_use_eflags_and_val(i32* %addr, i32 %offset) {		define i32 @cmpxchg_use_eflags_and_val(i32* %addr, i32 %offset) {
Show All 29 Lines

test/CodeGen/X86/cmpxchg-i128-i1.ll

Show All 38 Lines	; CHECK: retq
%pair = cmpxchg i128* %addr, i128 %desired, i128 %new seq_cst seq_cst		%pair = cmpxchg i128* %addr, i128 %desired, i128 %new seq_cst seq_cst
%oldval = extractvalue { i128, i1 } %pair, 0		%oldval = extractvalue { i128, i1 } %pair, 0
%success = icmp sge i128 %oldval, %desired		%success = icmp sge i128 %oldval, %desired
ret i1 %success		ret i1 %success
}		}

define i128 @cmpxchg_zext(i128* %addr, i128 %desired, i128 %new) {		define i128 @cmpxchg_zext(i128* %addr, i128 %desired, i128 %new) {
; CHECK-LABEL: cmpxchg_zext:		; CHECK-LABEL: cmpxchg_zext:
; CHECK: xorl
; CHECK: cmpxchg16b		; CHECK: cmpxchg16b
; CHECK-NOT: cmpq		; CHECK-NOT: cmpq
; CHECK: sete		; CHECK: sete [[BYTE:%[a-z0-9]+]]
		; CHECK: movzbl [[BYTE]], %eax
%pair = cmpxchg i128* %addr, i128 %desired, i128 %new seq_cst seq_cst		%pair = cmpxchg i128* %addr, i128 %desired, i128 %new seq_cst seq_cst
%success = extractvalue { i128, i1 } %pair, 1		%success = extractvalue { i128, i1 } %pair, 1
%mask = zext i1 %success to i128		%mask = zext i1 %success to i128
ret i128 %mask		ret i128 %mask
}		}


define i128 @cmpxchg_use_eflags_and_val(i128* %addr, i128 %offset) {		define i128 @cmpxchg_use_eflags_and_val(i128* %addr, i128 %offset) {
Show All 25 Lines

test/CodeGen/X86/fast-isel-cmp.ll

	; RUN: llc < %s -mtriple=x86_64-apple-darwin10 \| FileCheck %s --check-prefix=SDAG			; RUN: llc < %s -mtriple=x86_64-apple-darwin10 \| FileCheck %s --check-prefix=SDAG
	; RUN: llc < %s -fast-isel -fast-isel-abort=1 -mtriple=x86_64-apple-darwin10 \| FileCheck %s --check-prefix=FAST			; RUN: llc < %s -fast-isel -fast-isel-abort=1 -mtriple=x86_64-apple-darwin10 \| FileCheck %s --check-prefix=FAST

	define zeroext i1 @fcmp_oeq(float %x, float %y) {			define zeroext i1 @fcmp_oeq(float %x, float %y) {
	; SDAG-LABEL: fcmp_oeq			; SDAG-LABEL: fcmp_oeq
	; SDAG: cmpeqss %xmm1, %xmm0			; SDAG: cmpeqss %xmm1, %xmm0
	; SDAG-NEXT: movd %xmm0, %eax			; SDAG-NEXT: movd %xmm0, %eax
	; SDAG-NEXT: andl $1, %eax			; SDAG-NEXT: andl $1, %eax
	; FAST-LABEL: fcmp_oeq			; FAST-LABEL: fcmp_oeq
	; FAST: ucomiss %xmm1, %xmm0			; FAST: ucomiss %xmm1, %xmm0
	; FAST-NEXT: sete %al			; FAST-NEXT: sete %cl
	; FAST-NEXT: setnp %cl			; FAST-NEXT: setnp %al
	; FAST-NEXT: andb %al, %cl			; FAST-NEXT: andb %cl, %al
	%1 = fcmp oeq float %x, %y			%1 = fcmp oeq float %x, %y
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_ogt(float %x, float %y) {			define zeroext i1 @fcmp_ogt(float %x, float %y) {
	; SDAG-LABEL: fcmp_ogt			; SDAG-LABEL: fcmp_ogt
	; SDAG: ucomiss %xmm1, %xmm0			; SDAG: ucomiss %xmm1, %xmm0
	; SDAG-NEXT: seta %al			; SDAG-NEXT: seta %al
	▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines

	define zeroext i1 @fcmp_une(float %x, float %y) {			define zeroext i1 @fcmp_une(float %x, float %y) {
	; SDAG-LABEL: fcmp_une			; SDAG-LABEL: fcmp_une
	; SDAG: cmpneqss %xmm1, %xmm0			; SDAG: cmpneqss %xmm1, %xmm0
	; SDAG-NEXT: movd %xmm0, %eax			; SDAG-NEXT: movd %xmm0, %eax
	; SDAG-NEXT: andl $1, %eax			; SDAG-NEXT: andl $1, %eax
	; FAST-LABEL: fcmp_une			; FAST-LABEL: fcmp_une
	; FAST: ucomiss %xmm1, %xmm0			; FAST: ucomiss %xmm1, %xmm0
	; FAST-NEXT: setne %al			; FAST-NEXT: setne %cl
	; FAST-NEXT: setp %cl			; FAST-NEXT: setp %al
	; FAST-NEXT: orb %al, %cl			; FAST-NEXT: orb %cl, %al
	%1 = fcmp une float %x, %y			%1 = fcmp une float %x, %y
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @icmp_eq(i32 %x, i32 %y) {			define zeroext i1 @icmp_eq(i32 %x, i32 %y) {
	; SDAG-LABEL: icmp_eq			; SDAG-LABEL: icmp_eq
	; SDAG: cmpl %esi, %edi			; SDAG: cmpl %esi, %edi
	; SDAG-NEXT: sete %al			; SDAG-NEXT: sete %al
	▲ Show 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
	define zeroext i1 @fcmp_oeq3(float %x) {			define zeroext i1 @fcmp_oeq3(float %x) {
	; SDAG-LABEL: fcmp_oeq3			; SDAG-LABEL: fcmp_oeq3
	; SDAG: xorps %xmm1, %xmm1			; SDAG: xorps %xmm1, %xmm1
	; SDAG-NEXT: cmpeqss %xmm1, %xmm0			; SDAG-NEXT: cmpeqss %xmm1, %xmm0
	; SDAG-NEXT: movd %xmm0, %eax			; SDAG-NEXT: movd %xmm0, %eax
	; SDAG-NEXT: andl $1, %eax			; SDAG-NEXT: andl $1, %eax
	; FAST-LABEL: fcmp_oeq3			; FAST-LABEL: fcmp_oeq3
	; FAST: xorps %xmm1, %xmm1			; FAST: xorps %xmm1, %xmm1
				; FAST-NEXT: xorl %eax, %eax
	; FAST-NEXT: ucomiss %xmm1, %xmm0			; FAST-NEXT: ucomiss %xmm1, %xmm0
	; FAST-NEXT: sete %al			; FAST-NEXT: sete %cl
	; FAST-NEXT: setnp %cl			; FAST-NEXT: setnp %al
	; FAST-NEXT: andb %al, %cl			; FAST-NEXT: andb %cl, %al
	%1 = fcmp oeq float %x, 0.000000e+00			%1 = fcmp oeq float %x, 0.000000e+00
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_ogt2(float %x) {			define zeroext i1 @fcmp_ogt2(float %x) {
	; SDAG-LABEL: fcmp_ogt2			; SDAG-LABEL: fcmp_ogt2
	; SDAG: xorl %eax, %eax			; SDAG: xorl %eax, %eax
	; FAST-LABEL: fcmp_ogt2			; FAST-LABEL: fcmp_ogt2
	; FAST: xorl %eax, %eax			; FAST: xorl %eax, %eax
	%1 = fcmp ogt float %x, %x			%1 = fcmp ogt float %x, %x
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_ogt3(float %x) {			define zeroext i1 @fcmp_ogt3(float %x) {
	; SDAG-LABEL: fcmp_ogt3			; SDAG-LABEL: fcmp_ogt3
	; SDAG: xorps %xmm1, %xmm1			; SDAG: xorps %xmm1, %xmm1
	; SDAG-NEXT: ucomiss %xmm1, %xmm0			; SDAG-NEXT: ucomiss %xmm1, %xmm0
	; SDAG-NEXT: seta %al			; SDAG-NEXT: seta %al
	; FAST-LABEL: fcmp_ogt3			; FAST-LABEL: fcmp_ogt3
	; FAST: xorps %xmm1, %xmm1			; FAST: xorps %xmm1, %xmm1
				; FAST-NEXT: xorl %eax, %eax
	; FAST-NEXT: ucomiss %xmm1, %xmm0			; FAST-NEXT: ucomiss %xmm1, %xmm0
	; FAST-NEXT: seta %al			; FAST-NEXT: seta %al
	%1 = fcmp ogt float %x, 0.000000e+00			%1 = fcmp ogt float %x, 0.000000e+00
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_oge2(float %x) {			define zeroext i1 @fcmp_oge2(float %x) {
	; SDAG-LABEL: fcmp_oge2			; SDAG-LABEL: fcmp_oge2
	; SDAG: ucomiss %xmm0, %xmm0			; SDAG: ucomiss %xmm0, %xmm0
	; SDAG-NEXT: setnp %al			; SDAG-NEXT: setnp %al
	; FAST-LABEL: fcmp_oge2			; FAST-LABEL: fcmp_oge2
	; FAST: ucomiss %xmm0, %xmm0			; FAST: ucomiss %xmm0, %xmm0
	; FAST-NEXT: setnp %al			; FAST-NEXT: setnp %al
	%1 = fcmp oge float %x, %x			%1 = fcmp oge float %x, %x
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_oge3(float %x) {			define zeroext i1 @fcmp_oge3(float %x) {
	; SDAG-LABEL: fcmp_oge3			; SDAG-LABEL: fcmp_oge3
	; SDAG: xorps %xmm1, %xmm1			; SDAG: xorps %xmm1, %xmm1
	; SDAG-NEXT: ucomiss %xmm1, %xmm0			; SDAG-NEXT: ucomiss %xmm1, %xmm0
	; SDAG-NEXT: setae %al			; SDAG-NEXT: setae %al
	; FAST-LABEL: fcmp_oge3			; FAST-LABEL: fcmp_oge3
	; FAST: xorps %xmm1, %xmm1			; FAST: xorps %xmm1, %xmm1
				; FAST-NEXT: xorl %eax, %eax
	; FAST-NEXT: ucomiss %xmm1, %xmm0			; FAST-NEXT: ucomiss %xmm1, %xmm0
	; FAST-NEXT: setae %al			; FAST-NEXT: setae %al
	%1 = fcmp oge float %x, 0.000000e+00			%1 = fcmp oge float %x, 0.000000e+00
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_olt2(float %x) {			define zeroext i1 @fcmp_olt2(float %x) {
	; SDAG-LABEL: fcmp_olt2			; SDAG-LABEL: fcmp_olt2
	; SDAG: xorl %eax, %eax			; SDAG: xorl %eax, %eax
	; FAST-LABEL: fcmp_olt2			; FAST-LABEL: fcmp_olt2
	; FAST: xorl %eax, %eax			; FAST: xorl %eax, %eax
	%1 = fcmp olt float %x, %x			%1 = fcmp olt float %x, %x
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_olt3(float %x) {			define zeroext i1 @fcmp_olt3(float %x) {
	; SDAG-LABEL: fcmp_olt3			; SDAG-LABEL: fcmp_olt3
	; SDAG: xorps %xmm1, %xmm1			; SDAG: xorps %xmm1, %xmm1
	; SDAG-NEXT: ucomiss %xmm0, %xmm1			; SDAG-NEXT: ucomiss %xmm0, %xmm1
	; SDAG-NEXT: seta %al			; SDAG-NEXT: seta %al
	; FAST-LABEL: fcmp_olt3			; FAST-LABEL: fcmp_olt3
	; FAST: xorps %xmm1, %xmm1			; FAST: xorps %xmm1, %xmm1
				; FAST-NEXT: xorl %eax, %eax
	; FAST-NEXT: ucomiss %xmm0, %xmm1			; FAST-NEXT: ucomiss %xmm0, %xmm1
	; FAST-NEXT: seta %al			; FAST-NEXT: seta %al
	%1 = fcmp olt float %x, 0.000000e+00			%1 = fcmp olt float %x, 0.000000e+00
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_ole2(float %x) {			define zeroext i1 @fcmp_ole2(float %x) {
	; SDAG-LABEL: fcmp_ole2			; SDAG-LABEL: fcmp_ole2
	; SDAG: ucomiss %xmm0, %xmm0			; SDAG: ucomiss %xmm0, %xmm0
	; SDAG-NEXT: setnp %al			; SDAG-NEXT: setnp %al
	; FAST-LABEL: fcmp_ole2			; FAST-LABEL: fcmp_ole2
	; FAST: ucomiss %xmm0, %xmm0			; FAST: ucomiss %xmm0, %xmm0
	; FAST-NEXT: setnp %al			; FAST-NEXT: setnp %al
	%1 = fcmp ole float %x, %x			%1 = fcmp ole float %x, %x
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_ole3(float %x) {			define zeroext i1 @fcmp_ole3(float %x) {
	; SDAG-LABEL: fcmp_ole3			; SDAG-LABEL: fcmp_ole3
	; SDAG: xorps %xmm1, %xmm1			; SDAG: xorps %xmm1, %xmm1
	; SDAG-NEXT: ucomiss %xmm0, %xmm1			; SDAG-NEXT: ucomiss %xmm0, %xmm1
	; SDAG-NEXT: setae %al			; SDAG-NEXT: setae %al
	; FAST-LABEL: fcmp_ole3			; FAST-LABEL: fcmp_ole3
	; FAST: xorps %xmm1, %xmm1			; FAST: xorps %xmm1, %xmm1
				; FAST-NEXT: xorl %eax, %eax
	; FAST-NEXT: ucomiss %xmm0, %xmm1			; FAST-NEXT: ucomiss %xmm0, %xmm1
	; FAST-NEXT: setae %al			; FAST-NEXT: setae %al
	%1 = fcmp ole float %x, 0.000000e+00			%1 = fcmp ole float %x, 0.000000e+00
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_one2(float %x) {			define zeroext i1 @fcmp_one2(float %x) {
	; SDAG-LABEL: fcmp_one2			; SDAG-LABEL: fcmp_one2
	; SDAG: xorl %eax, %eax			; SDAG: xorl %eax, %eax
	; FAST-LABEL: fcmp_one2			; FAST-LABEL: fcmp_one2
	; FAST: xorl %eax, %eax			; FAST: xorl %eax, %eax
	%1 = fcmp one float %x, %x			%1 = fcmp one float %x, %x
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_one3(float %x) {			define zeroext i1 @fcmp_one3(float %x) {
	; SDAG-LABEL: fcmp_one3			; SDAG-LABEL: fcmp_one3
	; SDAG: xorps %xmm1, %xmm1			; SDAG: xorps %xmm1, %xmm1
	; SDAG-NEXT: ucomiss %xmm1, %xmm0			; SDAG-NEXT: ucomiss %xmm1, %xmm0
	; SDAG-NEXT: setne %al			; SDAG-NEXT: setne %al
	; FAST-LABEL: fcmp_one3			; FAST-LABEL: fcmp_one3
	; FAST: xorps %xmm1, %xmm1			; FAST: xorps %xmm1, %xmm1
				; FAST-NEXT: xorl %eax, %eax
	; FAST-NEXT: ucomiss %xmm1, %xmm0			; FAST-NEXT: ucomiss %xmm1, %xmm0
	; FAST-NEXT: setne %al			; FAST-NEXT: setne %al
	%1 = fcmp one float %x, 0.000000e+00			%1 = fcmp one float %x, 0.000000e+00
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_ord2(float %x) {			define zeroext i1 @fcmp_ord2(float %x) {
	; SDAG-LABEL: fcmp_ord2			; SDAG-LABEL: fcmp_ord2
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines

	define zeroext i1 @fcmp_ueq3(float %x) {			define zeroext i1 @fcmp_ueq3(float %x) {
	; SDAG-LABEL: fcmp_ueq3			; SDAG-LABEL: fcmp_ueq3
	; SDAG: xorps %xmm1, %xmm1			; SDAG: xorps %xmm1, %xmm1
	; SDAG-NEXT: ucomiss %xmm1, %xmm0			; SDAG-NEXT: ucomiss %xmm1, %xmm0
	; SDAG-NEXT: sete %al			; SDAG-NEXT: sete %al
	; FAST-LABEL: fcmp_ueq3			; FAST-LABEL: fcmp_ueq3
	; FAST: xorps %xmm1, %xmm1			; FAST: xorps %xmm1, %xmm1
				; FAST-NEXT: xorl %eax, %eax
	; FAST-NEXT: ucomiss %xmm1, %xmm0			; FAST-NEXT: ucomiss %xmm1, %xmm0
	; FAST-NEXT: sete %al			; FAST-NEXT: sete %al
	%1 = fcmp ueq float %x, 0.000000e+00			%1 = fcmp ueq float %x, 0.000000e+00
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_ugt2(float %x) {			define zeroext i1 @fcmp_ugt2(float %x) {
	; SDAG-LABEL: fcmp_ugt2			; SDAG-LABEL: fcmp_ugt2
	; SDAG: ucomiss %xmm0, %xmm0			; SDAG: ucomiss %xmm0, %xmm0
	; SDAG-NEXT: setp %al			; SDAG-NEXT: setp %al
	; FAST-LABEL: fcmp_ugt2			; FAST-LABEL: fcmp_ugt2
	; FAST: ucomiss %xmm0, %xmm0			; FAST: ucomiss %xmm0, %xmm0
	; FAST-NEXT: setp %al			; FAST-NEXT: setp %al
	%1 = fcmp ugt float %x, %x			%1 = fcmp ugt float %x, %x
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_ugt3(float %x) {			define zeroext i1 @fcmp_ugt3(float %x) {
	; SDAG-LABEL: fcmp_ugt3			; SDAG-LABEL: fcmp_ugt3
	; SDAG: xorps %xmm1, %xmm1			; SDAG: xorps %xmm1, %xmm1
	; SDAG-NEXT: ucomiss %xmm0, %xmm1			; SDAG-NEXT: ucomiss %xmm0, %xmm1
	; SDAG-NEXT: setb %al			; SDAG-NEXT: setb %al
	; FAST-LABEL: fcmp_ugt3			; FAST-LABEL: fcmp_ugt3
	; FAST: xorps %xmm1, %xmm1			; FAST: xorps %xmm1, %xmm1
				; FAST-NEXT: xorl %eax, %eax
	; FAST-NEXT: ucomiss %xmm0, %xmm1			; FAST-NEXT: ucomiss %xmm0, %xmm1
	; FAST-NEXT: setb %al			; FAST-NEXT: setb %al
	%1 = fcmp ugt float %x, 0.000000e+00			%1 = fcmp ugt float %x, 0.000000e+00
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_uge2(float %x) {			define zeroext i1 @fcmp_uge2(float %x) {
	; SDAG-LABEL: fcmp_uge2			; SDAG-LABEL: fcmp_uge2
	; SDAG: movb $1, %al			; SDAG: movb $1, %al
	; FAST-LABEL: fcmp_uge2			; FAST-LABEL: fcmp_uge2
	; FAST: movb $1, %al			; FAST: movb $1, %al
	%1 = fcmp uge float %x, %x			%1 = fcmp uge float %x, %x
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_uge3(float %x) {			define zeroext i1 @fcmp_uge3(float %x) {
	; SDAG-LABEL: fcmp_uge3			; SDAG-LABEL: fcmp_uge3
	; SDAG: xorps %xmm1, %xmm1			; SDAG: xorps %xmm1, %xmm1
	; SDAG-NEXT: ucomiss %xmm0, %xmm1			; SDAG-NEXT: ucomiss %xmm0, %xmm1
	; SDAG-NEXT: setbe %al			; SDAG-NEXT: setbe %al
	; FAST-LABEL: fcmp_uge3			; FAST-LABEL: fcmp_uge3
	; FAST: xorps %xmm1, %xmm1			; FAST: xorps %xmm1, %xmm1
				; FAST-NEXT: xorl %eax, %eax
	; FAST-NEXT: ucomiss %xmm0, %xmm1			; FAST-NEXT: ucomiss %xmm0, %xmm1
	; FAST-NEXT: setbe %al			; FAST-NEXT: setbe %al
	%1 = fcmp uge float %x, 0.000000e+00			%1 = fcmp uge float %x, 0.000000e+00
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_ult2(float %x) {			define zeroext i1 @fcmp_ult2(float %x) {
	; SDAG-LABEL: fcmp_ult2			; SDAG-LABEL: fcmp_ult2
	; SDAG: ucomiss %xmm0, %xmm0			; SDAG: ucomiss %xmm0, %xmm0
	; SDAG-NEXT: setp %al			; SDAG-NEXT: setp %al
	; FAST-LABEL: fcmp_ult2			; FAST-LABEL: fcmp_ult2
	; FAST: ucomiss %xmm0, %xmm0			; FAST: ucomiss %xmm0, %xmm0
	; FAST-NEXT: setp %al			; FAST-NEXT: setp %al
	%1 = fcmp ult float %x, %x			%1 = fcmp ult float %x, %x
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_ult3(float %x) {			define zeroext i1 @fcmp_ult3(float %x) {
	; SDAG-LABEL: fcmp_ult3			; SDAG-LABEL: fcmp_ult3
	; SDAG: xorps %xmm1, %xmm1			; SDAG: xorps %xmm1, %xmm1
	; SDAG-NEXT: ucomiss %xmm1, %xmm0			; SDAG-NEXT: ucomiss %xmm1, %xmm0
	; SDAG-NEXT: setb %al			; SDAG-NEXT: setb %al
	; FAST-LABEL: fcmp_ult3			; FAST-LABEL: fcmp_ult3
	; FAST: xorps %xmm1, %xmm1			; FAST: xorps %xmm1, %xmm1
				; FAST-NEXT: xorl %eax, %eax
	; FAST-NEXT: ucomiss %xmm1, %xmm0			; FAST-NEXT: ucomiss %xmm1, %xmm0
	; FAST-NEXT: setb %al			; FAST-NEXT: setb %al
	%1 = fcmp ult float %x, 0.000000e+00			%1 = fcmp ult float %x, 0.000000e+00
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_ule2(float %x) {			define zeroext i1 @fcmp_ule2(float %x) {
	; SDAG-LABEL: fcmp_ule2			; SDAG-LABEL: fcmp_ule2
	; SDAG: movb $1, %al			; SDAG: movb $1, %al
	; FAST-LABEL: fcmp_ule2			; FAST-LABEL: fcmp_ule2
	; FAST: movb $1, %al			; FAST: movb $1, %al
	%1 = fcmp ule float %x, %x			%1 = fcmp ule float %x, %x
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_ule3(float %x) {			define zeroext i1 @fcmp_ule3(float %x) {
	; SDAG-LABEL: fcmp_ule3			; SDAG-LABEL: fcmp_ule3
	; SDAG: xorps %xmm1, %xmm1			; SDAG: xorps %xmm1, %xmm1
	; SDAG-NEXT: ucomiss %xmm1, %xmm0			; SDAG-NEXT: ucomiss %xmm1, %xmm0
	; SDAG-NEXT: setbe %al			; SDAG-NEXT: setbe %al
	; FAST-LABEL: fcmp_ule3			; FAST-LABEL: fcmp_ule3
	; FAST: xorps %xmm1, %xmm1			; FAST: xorps %xmm1, %xmm1
				; FAST-NEXT: xorl %eax, %eax
	; FAST-NEXT: ucomiss %xmm1, %xmm0			; FAST-NEXT: ucomiss %xmm1, %xmm0
	; FAST-NEXT: setbe %al			; FAST-NEXT: setbe %al
	%1 = fcmp ule float %x, 0.000000e+00			%1 = fcmp ule float %x, 0.000000e+00
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @fcmp_une2(float %x) {			define zeroext i1 @fcmp_une2(float %x) {
	; SDAG-LABEL: fcmp_une2			; SDAG-LABEL: fcmp_une2
	Show All 9 Lines
	define zeroext i1 @fcmp_une3(float %x) {			define zeroext i1 @fcmp_une3(float %x) {
	; SDAG-LABEL: fcmp_une3			; SDAG-LABEL: fcmp_une3
	; SDAG: xorps %xmm1, %xmm1			; SDAG: xorps %xmm1, %xmm1
	; SDAG-NEXT: cmpneqss %xmm1, %xmm0			; SDAG-NEXT: cmpneqss %xmm1, %xmm0
	; SDAG-NEXT: movd %xmm0, %eax			; SDAG-NEXT: movd %xmm0, %eax
	; SDAG-NEXT: andl $1, %eax			; SDAG-NEXT: andl $1, %eax
	; FAST-LABEL: fcmp_une3			; FAST-LABEL: fcmp_une3
	; FAST: xorps %xmm1, %xmm1			; FAST: xorps %xmm1, %xmm1
				; FAST: xorl %eax, %eax
	; FAST-NEXT: ucomiss %xmm1, %xmm0			; FAST-NEXT: ucomiss %xmm1, %xmm0
	; FAST-NEXT: setne %al			; FAST-NEXT: setne %cl
	; FAST-NEXT: setp %cl			; FAST-NEXT: setp %al
	; FAST-NEXT: orb %al, %cl			; FAST-NEXT: orb %cl, %al
	%1 = fcmp une float %x, 0.000000e+00			%1 = fcmp une float %x, 0.000000e+00
	ret i1 %1			ret i1 %1
	}			}

	define zeroext i1 @icmp_eq2(i32 %x) {			define zeroext i1 @icmp_eq2(i32 %x) {
	; SDAG-LABEL: icmp_eq2			; SDAG-LABEL: icmp_eq2
	; SDAG: movb $1, %al			; SDAG: movb $1, %al
	; FAST-LABEL: icmp_eq2			; FAST-LABEL: icmp_eq2
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

test/CodeGen/X86/fp128-cast.ll

Show First 20 Lines • Show All 232 Lines • ▼ Show 20 Lines	entry:
ret i32 %conv		ret i32 %conv
; X32-LABEL: TestConst128:		; X32-LABEL: TestConst128:
; X32: calll __gttf2		; X32: calll __gttf2
; X32: retl		; X32: retl
;		;
; X64-LABEL: TestConst128:		; X64-LABEL: TestConst128:
; X64: movaps {{.*}}, %xmm1		; X64: movaps {{.*}}, %xmm1
; X64-NEXT: callq __gttf2		; X64-NEXT: callq __gttf2
; X64-NEXT: xorl		; X64: test
; X64-NEXT: test
; X64: retq		; X64: retq
}		}

; C code:		; C code:
; struct TestBits_ieee_ext {		; struct TestBits_ieee_ext {
; unsigned v1;		; unsigned v1;
; unsigned v2;		; unsigned v2;
; };		; };
▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

test/CodeGen/X86/fp128-compare.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -O2 -mtriple=x86_64-linux-android -mattr=+mmx \| FileCheck %s			; RUN: llc < %s -O2 -mtriple=x86_64-linux-android -mattr=+mmx \| FileCheck %s
	; RUN: llc < %s -O2 -mtriple=x86_64-linux-gnu -mattr=+mmx \| FileCheck %s			; RUN: llc < %s -O2 -mtriple=x86_64-linux-gnu -mattr=+mmx \| FileCheck %s

	define i32 @TestComp128GT(fp128 %d1, fp128 %d2) {			define i32 @TestComp128GT(fp128 %d1, fp128 %d2) {
				; CHECK-LABEL: TestComp128GT:
				; CHECK: # BB#0: # %entry
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: .Ltmp0:
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: callq __gttf2
				; CHECK-NEXT: movl %eax, %ecx
				; CHECK-NEXT: xorl %eax, %eax
				; CHECK-NEXT: testl %ecx, %ecx
				; CHECK-NEXT: setg %al
				; CHECK-NEXT: popq %rcx
				; CHECK-NEXT: retq
	entry:			entry:
	%cmp = fcmp ogt fp128 %d1, %d2			%cmp = fcmp ogt fp128 %d1, %d2
	%conv = zext i1 %cmp to i32			%conv = zext i1 %cmp to i32
	ret i32 %conv			ret i32 %conv
	; CHECK-LABEL: TestComp128GT:
	; CHECK: callq __gttf2
	; CHECK: xorl %ecx, %ecx
	; CHECK: setg %cl
	; CHECK: movl %ecx, %eax
	; CHECK: retq
	}			}

	define i32 @TestComp128GE(fp128 %d1, fp128 %d2) {			define i32 @TestComp128GE(fp128 %d1, fp128 %d2) {
				; CHECK-LABEL: TestComp128GE:
				; CHECK: # BB#0: # %entry
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: .Ltmp1:
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: callq __getf2
				; CHECK-NEXT: movl %eax, %ecx
				; CHECK-NEXT: xorl %eax, %eax
				; CHECK-NEXT: testl %ecx, %ecx
				; CHECK-NEXT: setns %al
				; CHECK-NEXT: popq %rcx
				; CHECK-NEXT: retq
	entry:			entry:
	%cmp = fcmp oge fp128 %d1, %d2			%cmp = fcmp oge fp128 %d1, %d2
	%conv = zext i1 %cmp to i32			%conv = zext i1 %cmp to i32
	ret i32 %conv			ret i32 %conv
	; CHECK-LABEL: TestComp128GE:
	; CHECK: callq __getf2
	; CHECK: xorl %ecx, %ecx
	; CHECK: testl %eax, %eax
	; CHECK: setns %cl
	; CHECK: movl %ecx, %eax
	; CHECK: retq
	}			}

	define i32 @TestComp128LT(fp128 %d1, fp128 %d2) {			define i32 @TestComp128LT(fp128 %d1, fp128 %d2) {
				; CHECK-LABEL: TestComp128LT:
				; CHECK: # BB#0: # %entry
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: .Ltmp2:
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: callq __lttf2
				; CHECK-NEXT: shrl $31, %eax
				; CHECK-NEXT: popq %rcx
				; CHECK-NEXT: retq
	entry:			entry:
	%cmp = fcmp olt fp128 %d1, %d2			%cmp = fcmp olt fp128 %d1, %d2
	%conv = zext i1 %cmp to i32			%conv = zext i1 %cmp to i32
	ret i32 %conv			ret i32 %conv
	; CHECK-LABEL: TestComp128LT:
	; CHECK: callq __lttf2
	; CHECK-NEXT: shrl $31, %eax
	; CHECK: retq
	;
	; The 'shrl' is a special optimization in llvm to combine			; The 'shrl' is a special optimization in llvm to combine
	; the effect of 'fcmp olt' and 'zext'. The main purpose is			; the effect of 'fcmp olt' and 'zext'. The main purpose is
	; to test soften call to __lttf2.			; to test soften call to __lttf2.
	}			}

	define i32 @TestComp128LE(fp128 %d1, fp128 %d2) {			define i32 @TestComp128LE(fp128 %d1, fp128 %d2) {
				; CHECK-LABEL: TestComp128LE:
				; CHECK: # BB#0: # %entry
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: .Ltmp3:
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: callq __letf2
				; CHECK-NEXT: movl %eax, %ecx
				; CHECK-NEXT: xorl %eax, %eax
				; CHECK-NEXT: testl %ecx, %ecx
				; CHECK-NEXT: setle %al
				; CHECK-NEXT: popq %rcx
				; CHECK-NEXT: retq
	entry:			entry:
	%cmp = fcmp ole fp128 %d1, %d2			%cmp = fcmp ole fp128 %d1, %d2
	%conv = zext i1 %cmp to i32			%conv = zext i1 %cmp to i32
	ret i32 %conv			ret i32 %conv
	; CHECK-LABEL: TestComp128LE:
	; CHECK: callq __letf2
	; CHECK: xorl %ecx, %ecx
	; CHECK: testl %eax, %eax
	; CHECK: setle %cl
	; CHECK: movl %ecx, %eax
	; CHECK: retq
	}			}

	define i32 @TestComp128EQ(fp128 %d1, fp128 %d2) {			define i32 @TestComp128EQ(fp128 %d1, fp128 %d2) {
				; CHECK-LABEL: TestComp128EQ:
				; CHECK: # BB#0: # %entry
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: .Ltmp4:
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: callq __eqtf2
				; CHECK-NEXT: movl %eax, %ecx
				; CHECK-NEXT: xorl %eax, %eax
				; CHECK-NEXT: testl %ecx, %ecx
				; CHECK-NEXT: sete %al
				; CHECK-NEXT: popq %rcx
				; CHECK-NEXT: retq
	entry:			entry:
	%cmp = fcmp oeq fp128 %d1, %d2			%cmp = fcmp oeq fp128 %d1, %d2
	%conv = zext i1 %cmp to i32			%conv = zext i1 %cmp to i32
	ret i32 %conv			ret i32 %conv
	; CHECK-LABEL: TestComp128EQ:
	; CHECK: callq __eqtf2
	; CHECK: xorl %ecx, %ecx
	; CHECK: testl %eax, %eax
	; CHECK: sete %cl
	; CHECK: movl %ecx, %eax
	; CHECK: retq
	}			}

	define i32 @TestComp128NE(fp128 %d1, fp128 %d2) {			define i32 @TestComp128NE(fp128 %d1, fp128 %d2) {
				; CHECK-LABEL: TestComp128NE:
				; CHECK: # BB#0: # %entry
				; CHECK-NEXT: pushq %rax
				; CHECK-NEXT: .Ltmp5:
				; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: callq __netf2
				; CHECK-NEXT: movl %eax, %ecx
				; CHECK-NEXT: xorl %eax, %eax
				; CHECK-NEXT: testl %ecx, %ecx
				; CHECK-NEXT: setne %al
				; CHECK-NEXT: popq %rcx
				; CHECK-NEXT: retq
	entry:			entry:
	%cmp = fcmp une fp128 %d1, %d2			%cmp = fcmp une fp128 %d1, %d2
	%conv = zext i1 %cmp to i32			%conv = zext i1 %cmp to i32
	ret i32 %conv			ret i32 %conv
	; CHECK-LABEL: TestComp128NE:
	; CHECK: callq __netf2
	; CHECK: xorl %ecx, %ecx
	; CHECK: testl %eax, %eax
	; CHECK: setne %cl
	; CHECK: movl %ecx, %eax
	; CHECK: retq
	}			}

	define fp128 @TestMax(fp128 %x, fp128 %y) {			define fp128 @TestMax(fp128 %x, fp128 %y) {
				; CHECK-LABEL: TestMax:
				; CHECK: # BB#0: # %entry
				; CHECK-NEXT: subq $40, %rsp
				; CHECK-NEXT: .Ltmp6:
				; CHECK-NEXT: .cfi_def_cfa_offset 48
				; CHECK-NEXT: movaps %xmm0, {{[0-9]+}}(%rsp) # 16-byte Spill
				; CHECK-NEXT: movaps %xmm1, (%rsp) # 16-byte Spill
				; CHECK-NEXT: callq __gttf2
				; CHECK-NEXT: movaps {{[0-9]+}}(%rsp), %xmm0 # 16-byte Reload
				; CHECK-NEXT: testl %eax, %eax
				; CHECK-NEXT: jg .LBB6_2
				; CHECK-NEXT: # BB#1: # %entry
				; CHECK-NEXT: movaps (%rsp), %xmm0 # 16-byte Reload
				; CHECK-NEXT: .LBB6_2: # %entry
				; CHECK-NEXT: addq $40, %rsp
				; CHECK-NEXT: retq
	entry:			entry:
	%cmp = fcmp ogt fp128 %x, %y			%cmp = fcmp ogt fp128 %x, %y
	%cond = select i1 %cmp, fp128 %x, fp128 %y			%cond = select i1 %cmp, fp128 %x, fp128 %y
	ret fp128 %cond			ret fp128 %cond
	; CHECK-LABEL: TestMax:
	; CHECK: movaps %xmm0
	; CHECK: movaps %xmm1
	; CHECK: callq __gttf2
	; CHECK: movaps {{.*}}, %xmm0
	; CHECK: testl %eax, %eax
	; CHECK: movaps {{.*}}, %xmm0
	; CHECK: retq
	}			}

test/CodeGen/X86/sse-intrinsics-fast-isel.ll

Show First 20 Lines • Show All 564 Lines • ▼ Show 20 Lines
; X64-NEXT: retq		; X64-NEXT: retq
%res = call <4 x float> @llvm.x86.sse.cmp.ss(<4 x float> %a0, <4 x float> %a1, i8 3)		%res = call <4 x float> @llvm.x86.sse.cmp.ss(<4 x float> %a0, <4 x float> %a1, i8 3)
ret <4 x float> %res		ret <4 x float> %res
}		}

define i32 @test_mm_comieq_ss(<4 x float> %a0, <4 x float> %a1) nounwind {		define i32 @test_mm_comieq_ss(<4 x float> %a0, <4 x float> %a1) nounwind {
; X32-LABEL: test_mm_comieq_ss:		; X32-LABEL: test_mm_comieq_ss:
; X32: # BB#0:		; X32: # BB#0:
		; X32-NEXT: xorl %eax, %eax
; X32-NEXT: comiss %xmm1, %xmm0		; X32-NEXT: comiss %xmm1, %xmm0
; X32-NEXT: setnp %al		; X32-NEXT: setnp %cl
; X32-NEXT: sete %cl		; X32-NEXT: sete %al
; X32-NEXT: andb %al, %cl		; X32-NEXT: andb %cl, %al
; X32-NEXT: movzbl %cl, %eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_comieq_ss:		; X64-LABEL: test_mm_comieq_ss:
; X64: # BB#0:		; X64: # BB#0:
		; X64-NEXT: xorl %eax, %eax
; X64-NEXT: comiss %xmm1, %xmm0		; X64-NEXT: comiss %xmm1, %xmm0
; X64-NEXT: setnp %al		; X64-NEXT: setnp %cl
; X64-NEXT: sete %cl		; X64-NEXT: sete %al
; X64-NEXT: andb %al, %cl		; X64-NEXT: andb %cl, %al
; X64-NEXT: movzbl %cl, %eax
; X64-NEXT: retq		; X64-NEXT: retq
%res = call i32 @llvm.x86.sse.comieq.ss(<4 x float> %a0, <4 x float> %a1)		%res = call i32 @llvm.x86.sse.comieq.ss(<4 x float> %a0, <4 x float> %a1)
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.comieq.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.comieq.ss(<4 x float>, <4 x float>) nounwind readnone

define i32 @test_mm_comige_ss(<4 x float> %a0, <4 x float> %a1) nounwind {		define i32 @test_mm_comige_ss(<4 x float> %a0, <4 x float> %a1) nounwind {
; X32-LABEL: test_mm_comige_ss:		; X32-LABEL: test_mm_comige_ss:
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%res = call i32 @llvm.x86.sse.comilt.ss(<4 x float> %a0, <4 x float> %a1)		%res = call i32 @llvm.x86.sse.comilt.ss(<4 x float> %a0, <4 x float> %a1)
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.comilt.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.comilt.ss(<4 x float>, <4 x float>) nounwind readnone

define i32 @test_mm_comineq_ss(<4 x float> %a0, <4 x float> %a1) nounwind {		define i32 @test_mm_comineq_ss(<4 x float> %a0, <4 x float> %a1) nounwind {
; X32-LABEL: test_mm_comineq_ss:		; X32-LABEL: test_mm_comineq_ss:
; X32: # BB#0:		; X32: # BB#0:
		; X32-NEXT: xorl %eax, %eax
; X32-NEXT: comiss %xmm1, %xmm0		; X32-NEXT: comiss %xmm1, %xmm0
; X32-NEXT: setp %al		; X32-NEXT: setp %cl
; X32-NEXT: setne %cl		; X32-NEXT: setne %al
; X32-NEXT: orb %al, %cl		; X32-NEXT: orb %cl, %al
; X32-NEXT: movzbl %cl, %eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_comineq_ss:		; X64-LABEL: test_mm_comineq_ss:
; X64: # BB#0:		; X64: # BB#0:
		; X64-NEXT: xorl %eax, %eax
; X64-NEXT: comiss %xmm1, %xmm0		; X64-NEXT: comiss %xmm1, %xmm0
; X64-NEXT: setp %al		; X64-NEXT: setp %cl
; X64-NEXT: setne %cl		; X64-NEXT: setne %al
; X64-NEXT: orb %al, %cl		; X64-NEXT: orb %cl, %al
; X64-NEXT: movzbl %cl, %eax
; X64-NEXT: retq		; X64-NEXT: retq
%res = call i32 @llvm.x86.sse.comineq.ss(<4 x float> %a0, <4 x float> %a1)		%res = call i32 @llvm.x86.sse.comineq.ss(<4 x float> %a0, <4 x float> %a1)
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.comineq.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.comineq.ss(<4 x float>, <4 x float>) nounwind readnone

define i32 @test_mm_cvt_ss2si(<4 x float> %a0) nounwind {		define i32 @test_mm_cvt_ss2si(<4 x float> %a0) nounwind {
; X32-LABEL: test_mm_cvt_ss2si:		; X32-LABEL: test_mm_cvt_ss2si:
▲ Show 20 Lines • Show All 1,372 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
store <4 x float> %res2, <4 x float>* %a2, align 16		store <4 x float> %res2, <4 x float>* %a2, align 16
store <4 x float> %res3, <4 x float>* %a3, align 16		store <4 x float> %res3, <4 x float>* %a3, align 16
ret void		ret void
}		}

define i32 @test_mm_ucomieq_ss(<4 x float> %a0, <4 x float> %a1) nounwind {		define i32 @test_mm_ucomieq_ss(<4 x float> %a0, <4 x float> %a1) nounwind {
; X32-LABEL: test_mm_ucomieq_ss:		; X32-LABEL: test_mm_ucomieq_ss:
; X32: # BB#0:		; X32: # BB#0:
		; X32-NEXT: xorl %eax, %eax
; X32-NEXT: ucomiss %xmm1, %xmm0		; X32-NEXT: ucomiss %xmm1, %xmm0
; X32-NEXT: setnp %al		; X32-NEXT: setnp %cl
; X32-NEXT: sete %cl		; X32-NEXT: sete %al
; X32-NEXT: andb %al, %cl		; X32-NEXT: andb %cl, %al
; X32-NEXT: movzbl %cl, %eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_ucomieq_ss:		; X64-LABEL: test_mm_ucomieq_ss:
; X64: # BB#0:		; X64: # BB#0:
		; X64-NEXT: xorl %eax, %eax
; X64-NEXT: ucomiss %xmm1, %xmm0		; X64-NEXT: ucomiss %xmm1, %xmm0
; X64-NEXT: setnp %al		; X64-NEXT: setnp %cl
; X64-NEXT: sete %cl		; X64-NEXT: sete %al
; X64-NEXT: andb %al, %cl		; X64-NEXT: andb %cl, %al
; X64-NEXT: movzbl %cl, %eax
; X64-NEXT: retq		; X64-NEXT: retq
%res = call i32 @llvm.x86.sse.ucomieq.ss(<4 x float> %a0, <4 x float> %a1)		%res = call i32 @llvm.x86.sse.ucomieq.ss(<4 x float> %a0, <4 x float> %a1)
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.ucomieq.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.ucomieq.ss(<4 x float>, <4 x float>) nounwind readnone

define i32 @test_mm_ucomige_ss(<4 x float> %a0, <4 x float> %a1) nounwind {		define i32 @test_mm_ucomige_ss(<4 x float> %a0, <4 x float> %a1) nounwind {
; X32-LABEL: test_mm_ucomige_ss:		; X32-LABEL: test_mm_ucomige_ss:
▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%res = call i32 @llvm.x86.sse.ucomilt.ss(<4 x float> %a0, <4 x float> %a1)		%res = call i32 @llvm.x86.sse.ucomilt.ss(<4 x float> %a0, <4 x float> %a1)
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.ucomilt.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.ucomilt.ss(<4 x float>, <4 x float>) nounwind readnone

define i32 @test_mm_ucomineq_ss(<4 x float> %a0, <4 x float> %a1) nounwind {		define i32 @test_mm_ucomineq_ss(<4 x float> %a0, <4 x float> %a1) nounwind {
; X32-LABEL: test_mm_ucomineq_ss:		; X32-LABEL: test_mm_ucomineq_ss:
; X32: # BB#0:		; X32: # BB#0:
		; X32-NEXT: xorl %eax, %eax
; X32-NEXT: ucomiss %xmm1, %xmm0		; X32-NEXT: ucomiss %xmm1, %xmm0
; X32-NEXT: setp %al		; X32-NEXT: setp %cl
; X32-NEXT: setne %cl		; X32-NEXT: setne %al
; X32-NEXT: orb %al, %cl		; X32-NEXT: orb %cl, %al
; X32-NEXT: movzbl %cl, %eax
; X32-NEXT: retl		; X32-NEXT: retl
;		;
; X64-LABEL: test_mm_ucomineq_ss:		; X64-LABEL: test_mm_ucomineq_ss:
; X64: # BB#0:		; X64: # BB#0:
		; X64-NEXT: xorl %eax, %eax
; X64-NEXT: ucomiss %xmm1, %xmm0		; X64-NEXT: ucomiss %xmm1, %xmm0
; X64-NEXT: setp %al		; X64-NEXT: setp %cl
; X64-NEXT: setne %cl		; X64-NEXT: setne %al
; X64-NEXT: orb %al, %cl		; X64-NEXT: orb %cl, %al
; X64-NEXT: movzbl %cl, %eax
; X64-NEXT: retq		; X64-NEXT: retq
%res = call i32 @llvm.x86.sse.ucomineq.ss(<4 x float> %a0, <4 x float> %a1)		%res = call i32 @llvm.x86.sse.ucomineq.ss(<4 x float> %a0, <4 x float> %a1)
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.ucomineq.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.ucomineq.ss(<4 x float>, <4 x float>) nounwind readnone

define <4 x float> @test_mm_undefined_ps() {		define <4 x float> @test_mm_undefined_ps() {
; X32-LABEL: test_mm_undefined_ps:		; X32-LABEL: test_mm_undefined_ps:
▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

test/CodeGen/X86/sse-intrinsics-x86.ll

		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; NOTE: Assertions have been autogenerated by update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by update_llc_test_checks.py
; RUN: llc < %s -mtriple=i386-apple-darwin -mattr=-avx,+sse \| FileCheck %s --check-prefix=SSE		; RUN: llc < %s -mtriple=i386-apple-darwin -mattr=-avx,+sse \| FileCheck %s --check-prefix=SSE
; RUN: llc < %s -mtriple=i386-apple-darwin -mcpu=knl \| FileCheck %s --check-prefix=KNL		; RUN: llc < %s -mtriple=i386-apple-darwin -mcpu=knl \| FileCheck %s --check-prefix=KNL

define <4 x float> @test_x86_sse_add_ss(<4 x float> %a0, <4 x float> %a1) {		define <4 x float> @test_x86_sse_add_ss(<4 x float> %a0, <4 x float> %a1) {
; SSE-LABEL: test_x86_sse_add_ss:		; SSE-LABEL: test_x86_sse_add_ss:
; SSE: ## BB#0:		; SSE: ## BB#0:
; SSE-NEXT: addss %xmm1, %xmm0		; SSE-NEXT: addss %xmm1, %xmm0
Show All 39 Lines	; KNL-NEXT: retl
ret <4 x float> %res		ret <4 x float> %res
}		}
declare <4 x float> @llvm.x86.sse.cmp.ss(<4 x float>, <4 x float>, i8) nounwind readnone		declare <4 x float> @llvm.x86.sse.cmp.ss(<4 x float>, <4 x float>, i8) nounwind readnone


define i32 @test_x86_sse_comieq_ss(<4 x float> %a0, <4 x float> %a1) {		define i32 @test_x86_sse_comieq_ss(<4 x float> %a0, <4 x float> %a1) {
; SSE-LABEL: test_x86_sse_comieq_ss:		; SSE-LABEL: test_x86_sse_comieq_ss:
; SSE: ## BB#0:		; SSE: ## BB#0:
		; SSE-NEXT: xorl %eax, %eax
; SSE-NEXT: comiss %xmm1, %xmm0		; SSE-NEXT: comiss %xmm1, %xmm0
; SSE-NEXT: setnp %al		; SSE-NEXT: setnp %cl
; SSE-NEXT: sete %cl		; SSE-NEXT: sete %al
; SSE-NEXT: andb %al, %cl		; SSE-NEXT: andb %cl, %al
; SSE-NEXT: movzbl %cl, %eax
; SSE-NEXT: retl		; SSE-NEXT: retl
;		;
; KNL-LABEL: test_x86_sse_comieq_ss:		; KNL-LABEL: test_x86_sse_comieq_ss:
; KNL: ## BB#0:		; KNL: ## BB#0:
		; KNL-NEXT: xorl %eax, %eax
; KNL-NEXT: vcomiss %xmm1, %xmm0		; KNL-NEXT: vcomiss %xmm1, %xmm0
; KNL-NEXT: setnp %al		; KNL-NEXT: setnp %cl
; KNL-NEXT: sete %cl		; KNL-NEXT: sete %al
; KNL-NEXT: andb %al, %cl		; KNL-NEXT: andb %cl, %al
; KNL-NEXT: movzbl %cl, %eax
; KNL-NEXT: retl		; KNL-NEXT: retl
%res = call i32 @llvm.x86.sse.comieq.ss(<4 x float> %a0, <4 x float> %a1) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse.comieq.ss(<4 x float> %a0, <4 x float> %a1) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.comieq.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.comieq.ss(<4 x float>, <4 x float>) nounwind readnone


define i32 @test_x86_sse_comige_ss(<4 x float> %a0, <4 x float> %a1) {		define i32 @test_x86_sse_comige_ss(<4 x float> %a0, <4 x float> %a1) {
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	; KNL-NEXT: retl
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.comilt.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.comilt.ss(<4 x float>, <4 x float>) nounwind readnone


define i32 @test_x86_sse_comineq_ss(<4 x float> %a0, <4 x float> %a1) {		define i32 @test_x86_sse_comineq_ss(<4 x float> %a0, <4 x float> %a1) {
; SSE-LABEL: test_x86_sse_comineq_ss:		; SSE-LABEL: test_x86_sse_comineq_ss:
; SSE: ## BB#0:		; SSE: ## BB#0:
		; SSE-NEXT: xorl %eax, %eax
; SSE-NEXT: comiss %xmm1, %xmm0		; SSE-NEXT: comiss %xmm1, %xmm0
; SSE-NEXT: setp %al		; SSE-NEXT: setp %cl
; SSE-NEXT: setne %cl		; SSE-NEXT: setne %al
; SSE-NEXT: orb %al, %cl		; SSE-NEXT: orb %cl, %al
; SSE-NEXT: movzbl %cl, %eax
; SSE-NEXT: retl		; SSE-NEXT: retl
;		;
; KNL-LABEL: test_x86_sse_comineq_ss:		; KNL-LABEL: test_x86_sse_comineq_ss:
; KNL: ## BB#0:		; KNL: ## BB#0:
		; KNL-NEXT: xorl %eax, %eax
; KNL-NEXT: vcomiss %xmm1, %xmm0		; KNL-NEXT: vcomiss %xmm1, %xmm0
; KNL-NEXT: setp %al		; KNL-NEXT: setp %cl
; KNL-NEXT: setne %cl		; KNL-NEXT: setne %al
; KNL-NEXT: orb %al, %cl		; KNL-NEXT: orb %cl, %al
; KNL-NEXT: movzbl %cl, %eax
; KNL-NEXT: retl		; KNL-NEXT: retl
%res = call i32 @llvm.x86.sse.comineq.ss(<4 x float> %a0, <4 x float> %a1) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse.comineq.ss(<4 x float> %a0, <4 x float> %a1) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.comineq.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.comineq.ss(<4 x float>, <4 x float>) nounwind readnone


define <4 x float> @test_x86_sse_cvtsi2ss(<4 x float> %a0) {		define <4 x float> @test_x86_sse_cvtsi2ss(<4 x float> %a0) {
▲ Show 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	; KNL-NEXT: retl
ret <4 x float> %res		ret <4 x float> %res
}		}
declare <4 x float> @llvm.x86.sse.sub.ss(<4 x float>, <4 x float>) nounwind readnone		declare <4 x float> @llvm.x86.sse.sub.ss(<4 x float>, <4 x float>) nounwind readnone


define i32 @test_x86_sse_ucomieq_ss(<4 x float> %a0, <4 x float> %a1) {		define i32 @test_x86_sse_ucomieq_ss(<4 x float> %a0, <4 x float> %a1) {
; SSE-LABEL: test_x86_sse_ucomieq_ss:		; SSE-LABEL: test_x86_sse_ucomieq_ss:
; SSE: ## BB#0:		; SSE: ## BB#0:
		; SSE-NEXT: xorl %eax, %eax
; SSE-NEXT: ucomiss %xmm1, %xmm0		; SSE-NEXT: ucomiss %xmm1, %xmm0
; SSE-NEXT: setnp %al		; SSE-NEXT: setnp %cl
; SSE-NEXT: sete %cl		; SSE-NEXT: sete %al
; SSE-NEXT: andb %al, %cl		; SSE-NEXT: andb %cl, %al
; SSE-NEXT: movzbl %cl, %eax
; SSE-NEXT: retl		; SSE-NEXT: retl
;		;
; KNL-LABEL: test_x86_sse_ucomieq_ss:		; KNL-LABEL: test_x86_sse_ucomieq_ss:
; KNL: ## BB#0:		; KNL: ## BB#0:
		; KNL-NEXT: xorl %eax, %eax
; KNL-NEXT: vucomiss %xmm1, %xmm0		; KNL-NEXT: vucomiss %xmm1, %xmm0
; KNL-NEXT: setnp %al		; KNL-NEXT: setnp %cl
; KNL-NEXT: sete %cl		; KNL-NEXT: sete %al
; KNL-NEXT: andb %al, %cl		; KNL-NEXT: andb %cl, %al
; KNL-NEXT: movzbl %cl, %eax
; KNL-NEXT: retl		; KNL-NEXT: retl
%res = call i32 @llvm.x86.sse.ucomieq.ss(<4 x float> %a0, <4 x float> %a1) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse.ucomieq.ss(<4 x float> %a0, <4 x float> %a1) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.ucomieq.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.ucomieq.ss(<4 x float>, <4 x float>) nounwind readnone


define i32 @test_x86_sse_ucomige_ss(<4 x float> %a0, <4 x float> %a1) {		define i32 @test_x86_sse_ucomige_ss(<4 x float> %a0, <4 x float> %a1) {
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	; KNL-NEXT: retl
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.ucomilt.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.ucomilt.ss(<4 x float>, <4 x float>) nounwind readnone


define i32 @test_x86_sse_ucomineq_ss(<4 x float> %a0, <4 x float> %a1) {		define i32 @test_x86_sse_ucomineq_ss(<4 x float> %a0, <4 x float> %a1) {
; SSE-LABEL: test_x86_sse_ucomineq_ss:		; SSE-LABEL: test_x86_sse_ucomineq_ss:
; SSE: ## BB#0:		; SSE: ## BB#0:
		; SSE-NEXT: xorl %eax, %eax
; SSE-NEXT: ucomiss %xmm1, %xmm0		; SSE-NEXT: ucomiss %xmm1, %xmm0
; SSE-NEXT: setp %al		; SSE-NEXT: setp %cl
; SSE-NEXT: setne %cl		; SSE-NEXT: setne %al
; SSE-NEXT: orb %al, %cl		; SSE-NEXT: orb %cl, %al
; SSE-NEXT: movzbl %cl, %eax
; SSE-NEXT: retl		; SSE-NEXT: retl
;		;
; KNL-LABEL: test_x86_sse_ucomineq_ss:		; KNL-LABEL: test_x86_sse_ucomineq_ss:
; KNL: ## BB#0:		; KNL: ## BB#0:
		; KNL-NEXT: xorl %eax, %eax
; KNL-NEXT: vucomiss %xmm1, %xmm0		; KNL-NEXT: vucomiss %xmm1, %xmm0
; KNL-NEXT: setp %al		; KNL-NEXT: setp %cl
; KNL-NEXT: setne %cl		; KNL-NEXT: setne %al
; KNL-NEXT: orb %al, %cl		; KNL-NEXT: orb %cl, %al
; KNL-NEXT: movzbl %cl, %eax
; KNL-NEXT: retl		; KNL-NEXT: retl
%res = call i32 @llvm.x86.sse.ucomineq.ss(<4 x float> %a0, <4 x float> %a1) ; <i32> [#uses=1]		%res = call i32 @llvm.x86.sse.ucomineq.ss(<4 x float> %a0, <4 x float> %a1) ; <i32> [#uses=1]
ret i32 %res		ret i32 %res
}		}
declare i32 @llvm.x86.sse.ucomineq.ss(<4 x float>, <4 x float>) nounwind readnone		declare i32 @llvm.x86.sse.ucomineq.ss(<4 x float>, <4 x float>) nounwind readnone

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Generalized transformation of `definstr gr8; movzx gr32, gr8` to `xor gr32, gr32; definstr gr8`Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 67362

lib/Target/X86/CMakeLists.txt

lib/Target/X86/X86.h

lib/Target/X86/X86FixupZExt.cpp

lib/Target/X86/X86TargetMachine.cpp

test/CodeGen/X86/avx-intrinsics-x86.ll

test/CodeGen/X86/avx512-cmp.ll

test/CodeGen/X86/cmpxchg-i1.ll

test/CodeGen/X86/cmpxchg-i128-i1.ll

test/CodeGen/X86/fast-isel-cmp.ll

test/CodeGen/X86/fp128-cast.ll

test/CodeGen/X86/fp128-compare.ll

test/CodeGen/X86/sse-intrinsics-fast-isel.ll

test/CodeGen/X86/sse-intrinsics-x86.ll

[X86] Generalized transformation of `definstr gr8; movzx gr32, gr8` to `xor gr32, gr32; definstr gr8`
Needs ReviewPublic