This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
CalcSpillWeights.h
-
lib/
-
CodeGen/
4
CalcSpillWeights.cpp
-
Target/X86/
-
X86/
-
CMakeLists.txt
-
X86.h
-
X86FixupZExt.cpp
-
X86TargetMachine.cpp

Differential D23253

[X86] Generalized transformation of `definstr gr8; movzx gr32, gr8` to `xor gr32, gr32; definstr gr8`
Needs ReviewPublic

Authored by bryant on Aug 7 2016, 9:02 PM.

Download Raw Diff

Details

Reviewers

mkuper
llvm-commits

Summary

As indicated by the title, this post-register allocation pre-rewrite pass
generalizes D21774 by matching patterns of the form,

gr8<def> = instr_defining_gr8  # that may or may not use eflags
gr32<def> = movzx gr8

into,

gr32 = mov32r0 eflags<imp-def>  # carefully avoids clobbering eflags
...
gr8<def> = instr_defining_gr8

with the goal of reducing read stalls, partial register stalls, micro-ops, and
overall binary size.

Except for a few rare cases, it never performs worse than D21774, and does
surprisingly better in other cases. IACA-annotated assembly output can be found
at https://reviews.llvm.org/P7213 (for x86-64) and at
https://reviews.llvm.org/P7214 (for x86-32).

Not all of the tests have been updated; this is still a work in progress.

EDIT: grammar.

Diff Detail

Repository: rL LLVM

Event Timeline

bryant updated this revision to Diff 67116.Aug 7 2016, 9:02 PM

bryant retitled this revision from to [X86] Generalized transformation of `definstr gr8; movzx gr32, gr8` to `xor gr32, gr32; definstr gr8`.

bryant updated this object.

bryant added reviewers: llvm-commits, mkuper.

bryant set the repository for this revision to rL LLVM.

bryant updated this object.

I should add that the benchmark data in the two aforementioned links were generated by running llc -march={x86,x86-64} -mattr=+sse,+sse4.2 -O3 over the entire test set under test/CodeGen/X86. An appreciable number of cases do not compile under these flags (or simply produce assembly output that gas cannot compile) and have been omitted by my annotator.

I would also like to note that because this pass works by register
re-allocation, it _never_ "pessimizes" the way D21774 does. For instance, under
-march=x86 (diff of assembler output of
test/CodeGen/X86/sse42-intrinsics-x86.ll, with "-" indicating D21774 and "+"
indicating this pass):

 test_x86_sse42_pcmpestria128:
-# Throughput: 2.25; Uops: 8; Latency: 7; Size: 36
+# Throughput: 1.55; Uops: 6; Latency: 3; Size: 33
-    pushl    %ebx
     movl    $7, %eax
     movl    $7, %edx
-    xorl    %ebx, %ebx
     pcmpestri    $7, %xmm1, %xmm0
-    seta    %bl
-    movl    %ebx, %eax
-    popl    %ebx
+    seta    %al
+    movzbl    %al, %eax
 retl

Or perhaps less obviously (from cmpxchg-i1.ll under x86-64):

 cmpxchg_zext:
-# Throughput: 2.05; Uops: 6; Latency: 10; Size: 24
+# Throughput: 1.85; Uops: 7; Latency: 10; Size: 23
-    xorl    %ecx, %ecx
     movl    %esi, %eax
     lock        cmpxchgl    %edx, (%rdi)
-    sete    %cl
-    movl    %ecx, %eax
+    sete    %al
+    movzbl    %al, %eax
 retq

On the other hand, because it matches on any gr8-defining instruction (not just
setccs), it can do things like (from pre-ra-sched.ll under x86-64):

 test:
-# Throughput: 11.05; Uops: 27; Latency: 14; Size: 77
+# Throughput: 6.55; Uops: 24; Latency: 14; Size: 77
     movzbl    1(%rdi), %r9d
-    movb    2(%rdi), %al
-    xorb    %r9b, %al
+    xorl    %r10d, %r10d
+    movb    2(%rdi), %r10b
+    xorb    %r9b, %r10b
     movzbl    3(%rdi), %esi
-    movb    4(%rdi), %cl
-    xorb    %sil, %cl
+    xorl    %eax, %eax
+    movb    4(%rdi), %al
+    xorb    %sil, %al
     movzbl    5(%rdi), %r8d
-    movb    6(%rdi), %dl
-    xorb    %r8b, %dl
+    xorl    %ecx, %ecx
+    movb    6(%rdi), %cl
+    xorb    %r8b, %cl
     cmpb    $0, (%rdi)
-    movzbl    %al, %edi
-    cmovnel    %r9d, %edi
-    movzbl    %cl, %eax
+    cmovnel    %r9d, %r10d
     cmovnel    %esi, %eax
-    movzbl    %dl, %ecx
     cmovnel    %r8d, %ecx
     testl    %r9d, %r9d
-    cmovnel    %edi, %eax
+    cmovnel    %r10d, %eax
     testl    %esi, %esi
     cmovel    %ecx, %eax
     retq

which, if IACA is to be believed, is 40% ++throughput for free.

Also, it should be noted that this differential depends on the zip iterator
introduced in D23252 .

Hi bryant,

Thanks a lot for working on this! The improvements looks really nice, and it's great that this fixes the pessimizations from rL274692.

I haven't really looked at the patch yet, only skimmed it briefly - so I still don't have even the slightest opinion on the logic.
So, for now, some very general comments/questions:

Have you considered adding logic to ExecutionDepsFix or X86FixupBWInsts, as opposed to a new pass? I haven't thought about this too much myself, but those are post-RA passes with somewhat similar goals, and it may make sense to share some of the code.
How generic is this? E.g. does this handle PR28442?
Do you know why this is failing on cases that we catch with rL274692? If the improvement is large enough we don't have to be strictly better than existing code (rL274692 itself wasn't :) ), but it'd be good to understand what's going on so we can at least guess whether it can happen in "interesting" non-toy cases.
This is a fairly large patch. It's probably fairly self-contained, and I don't immediately see a way to break into meaningful functional pieces. However, that makes it a priori harder to review. So it'd be really nice to make reviewers' life easier by providing copious documentation. :)
There's a decent amount of generic code that doesn't look like it should live in the pass. If it's generally useful, then it should live in one of the utility headers. If not, and it's only ever instantiated for a single type, then we can get rid of the template extra complexity. (Also, beware of the lack of compiler support. We need to build LLVM with some fairly old or odd compilers - MSVC 2013, first of foremost. If you haven't seen a pattern used in LLVM, it may be because we still want to be built with compilers that don't support it)
A couple of pervasive differences from the LLVM coding conventions I've noticed from skimming:
- We generally use camelcase for identifiers, with variables starting with an uppercase letter, and functions with lowercase. Common acronyms (like TRI) are all-caps. There are some exceptions ("standard" constructs like "int i", lower cases with underscores where we want to fit STL style, changes to old code that uses a different convention), but those are fairly rare.
- We don't usually explicitly compare to nullptr.

mkuper added subscribers: RKSimon, spatel, DavidKreitzer.Aug 8 2016, 12:37 AM

Fixed compile error caused by lack of "const" in return type.

Rely on RAII to clean up unused MOV32r0
Outsource register allocation hints to VirtRegAuxInfo::copyHint
Add regmask constraints if no hints are found

Herald added a subscriber: qcolombet. · View Herald TranscriptAug 14 2016, 11:23 AM

Conform to camel-case naming style for functions.

Conform to variable name casing convention.

Convert explicit checks for nullptr to operator bool.

In D23253#508298, @mkuper wrote:

Hi bryant,

Thanks a lot for working on this! The improvements looks really nice, and it's
great that this fixes the pessimizations from rL274692.

I haven't really looked at the patch yet, only skimmed it briefly - so I still
don't have even the slightest opinion on the logic. So, for now, some very
general comments/questions:

Please, please review. I've put a great deal of thinking into this.

Have you considered adding logic to ExecutionDepsFix or X86FixupBWInsts, as

opposed to a new pass? I haven't thought about this too much myself, but those
are post-RA passes with somewhat similar goals, and it may make sense to share
some of the code.

How do you propose doing this in either of those two passes? And in any case,
the set of heurestics relies on virtual register liveness data that is rewritten
away long before pre-sched/pre-emit passes execute.

How generic is this? E.g. does this handle PR28442?

It matches on any GR8-defining instruction, so I'd say...as generic as can be?
And yes, of course it handles PR28442:

bash
19:54:19 ~/3rd/llvm> cat > pr28442.c
int foo(int a, int b, int c) {
  return (a > 0 && b > 0 && c > 0);
}
20:15:40 ~/3rd/llvm> clang -o - -S -emit-llvm -O3 pr28442.c | llc -filetype=obj -O3 -o setccfixup.o -setcc-fixup
20:16:07 ~/3rd/llvm> clang -o - -S -emit-llvm -O3 pr28442.c | llc -filetype=obj -O3 -o zextfixup.o
20:16:34 ~/3rd/llvm> diff -u <(python iacapipe.py setccfixup.o) <(python iacapipe.py zextfixup.o)
--- /dev/fd/63  2016-08-14 20:16:48.477885771 +0000
+++ /dev/fd/62  2016-08-14 20:16:48.477885771 +0000
@@ -1,13 +1,13 @@
-# Throughput: 2.65; Uops: 10; Latency: 5; Size: 23
+# Throughput: 2.65; Uops: 9; Latency: 5; Size: 22
 foo:
     test   %edi,%edi
     setg   %al
     test   %esi,%esi
     setg   %cl
     and    %al,%cl
+    xor    %eax,%eax
     test   %edx,%edx
     setg   %al
     and    %cl,%al
-    movzbl %al,%eax
     retq

Do you know why this is failing on cases that we catch with rL274692? If the

improvement is large enough we don't have to be strictly better than existing
code (rL274692 itself wasn't :) ), but it'd be good to understand what's going
on so we can at least guess whether it can happen in "interesting" non-toy
cases.

From my tests on CodeGen/X86/*.ll, it only ever fails on x86-32 for two
reasons:

The x32 allocation order for CSRs prioritizes ESI and EDI over EBX. Because

this pass never touches unused CSRs, it is possible for a function to alloc
E{AX,CX,DX,SI,DI} but not EBX and thus reducing the pool of available
GR32_with_sub8bit by one. This is illustrated by 2008-09-11-CoalescerBug.ll:

bash
20:22:18 ~/3rd/llvm> diff -u --unified=9999999 <(python iacapipe.py obj/kuper-x86/2008-09-11-CoalescerBug.o) <(python iacapipe.py obj/control-x86/2008-09-11-CoalescerBug.o)
--- /dev/fd/63  2016-08-14 20:23:00.352808087 +0000
+++ /dev/fd/62  2016-08-14 20:23:00.352808087 +0000
@@ -1,35 +1,34 @@
-# Throughput: 9.60; Uops: 43; Latency: 20; Size: 98
+# Throughput: 10.10; Uops: 45; Latency: 19; Size: 98
 func_3:
-    push   %ebx
+    push   %edi
     push   %esi
     push   %eax
     movzwl 0x0,%esi
     and    $0x1,%esi
     movl   $0x1,(%esp)
     call   15 <func_3+0x15>
     xor    %ecx,%ecx
     cmp    $0x2,%eax
     setl   %cl
-    xor    %ebx,%ebx
     cmp    %ecx,%esi
-    setge  %bl
-    xor    %eax,%eax
+    setge  %al
+    movzbl %al,%esi
     cmpw   $0x0,0x0
     sete   %al
-    mov    %eax,%esi
+    movzbl %al,%edi
     movl   $0x1,(%esp)
     call   3f <func_3+0x3f>
     xor    %ecx,%ecx
-    cmp    %eax,%esi
+    cmp    %eax,%edi
     setge  %cl
-    sub    %ecx,%ebx
-    cmp    $0x2,%ebx
+    sub    %ecx,%esi
+    cmp    $0x2,%esi
     sbb    %eax,%eax
     and    $0x1,%eax
     mov    %eax,(%esp)
     call   58 <func_3+0x58>
     add    $0x4,%esp
     pop    %esi
-    pop    %ebx
+    pop    %edi
     ret

It's also possible for RA to spill the GR8-defining instruction, thus

preventing this pass from recognizing the pattern. This is seen in
2009-08-23-SubRegReuseUndo.ll:

bash
--- annot/kuper-x86/2009-08-23-SubRegReuseUndo.o	2016-08-14 20:27:55.032325645 +0000
+++ annot/control-x86/2009-08-23-SubRegReuseUndo.o	2016-08-14 20:27:07.079072090 +0000
@@ -1,88 +1,88 @@
-# Throughput: 31.00; Uops: 113; Latency: 32; Size: 285
+# Throughput: 32.00; Uops: 116; Latency: 32; Size: 290
 uint80:
     push   %ebp
     push   %ebx
     push   %edi
     push   %esi
     sub    $0x1c,%esp
     movsbl 0x30(%esp),%ecx
-    xor    %eax,%eax
     test   %cx,%cx
-    setne  %al
-    mov    %eax,%esi
+    setne  0x1b(%esp)
     movzwl %cx,%eax
-    mov    %ecx,%edi
+    mov    %ecx,%esi
     mov    %eax,(%esp)
     mov    $0x0,%eax
     movsbl %al,%eax
     mov    %eax,0x4(%esp)
     call   29 <uint80+0x29>
     mov    %eax,%ebx
     or     $0x1,%bl
     movl   $0x1,(%esp)
     call   3c <uint80+0x3c>
     movl   $0x0,0x4(%esp)
     movl   $0x0,(%esp)
     call   50 <uint80+0x50>
-    mov    %edi,%ecx
+    mov    %esi,%ecx
+    mov    %ecx,%edi
     xor    %cl,%al
     xor    %bl,%al
     mov    %eax,%ebp
-    mov    %esi,0x4(%esp)
+    movzbl 0x1b(%esp),%eax
+    mov    %eax,0x4(%esp)
     mov    $0x0,%eax
     movzwl %ax,%eax
     mov    %eax,(%esp)
     call   6c <uint80+0x6c>
     mov    %eax,%esi
     movl   $0x1,0x4(%esp)
     movl   $0x0,(%esp)
     call   82 <uint80+0x82>
     xor    %eax,%eax
     test   %al,%al
     jne .L0
     mov    $0x1,%eax
     xor    %ecx,%ecx
     test   %cl,%cl
     jne .L1
 .L0:
     xor    %eax,%eax
 .L1:
     xor    %ebx,%ebx
     cmp    %eax,%esi
     setne  %bl
     movl   $0xfffffffe,(%esp)
     call   ad <uint80+0xad>
     mov    %ebp,%eax
     movsbl %al,%eax
     mov    %eax,0xc(%esp)
     mov    %ebx,0x8(%esp)
     movl   $0x1,0x4(%esp)
     movl   $0x0,(%esp)
     call   ce <uint80+0xce>
     xor    %eax,%eax
     test   %al,%al
     jne .L2
     mov    0x0,%eax
 .L2:
     mov    0x0,%eax
     movl   $0x1,0x4(%esp)
     movl   $0x0,(%esp)
     call   f2 <uint80+0xf2>
     xor    %eax,%eax
     test   %al,%al
     je .L3
     add    $0x1c,%esp
     pop    %esi
     pop    %edi
     pop    %ebx
     pop    %ebp
     ret
 .L3:
     mov    %edi,%eax
     movsbl %al,%eax
     mov    0x0,%ecx
     mov    %eax,(%esp)
     movl   $0x1,0x4(%esp)
     call   11b <uint80+0x11b>
     sub    $0x8,%esp

This is a fairly large patch. It's probably fairly self-contained, and I

don't immediately see a way to break into meaningful functional pieces.
However, that makes it a priori harder to review. So it'd be really nice to
make reviewers' life easier by providing copious documentation. :)

Right, I'll sprinkle in some comments.

There's a decent amount of generic code that doesn't look like it should

live in the pass. If it's generally useful, then it should live in one of the
utility headers. If not, and it's only ever instantiated for a single type,
then we can get rid of the template extra complexity. (Also, beware of the
lack of compiler support. We need to build LLVM with some fairly old or odd
compilers - MSVC 2013, first of foremost. If you haven't seen a pattern used
in LLVM, it may be because we still want to be built with compilers that don't
support it)

A couple of pervasive differences from the LLVM coding conventions I've

noticed from skimming:

We generally use camelcase for identifiers, with variables starting with an

uppercase letter, and functions with lowercase. Common acronyms (like TRI) are
all-caps. There are some exceptions ("standard" constructs like "int i", lower
cases with underscores where we want to fit STL style, changes to old code
that uses a different convention), but those are fairly rare.

We don't usually explicitly compare to nullptr.

Fixed.

bryant added inline comments.Aug 14 2016, 1:42 PM

lib/CodeGen/CalcSpillWeights.cpp
46–48	All this does is permit access to `copyHint` to the general public. Please advise if this belongs in a separate differential.

Fix nullptr deref caused by swapped cases.

upload the right patch.

n.bozhenov added a subscriber: n.bozhenov.Aug 17 2016, 5:26 AM

ping?

Hi,

What is the performance impact (both runtime and compile time) of the pass?
You list results for test/CodeGen/X86, which is not usually what we use for perform testing. Could you try on the LLVM test suite?

Moreover, like Michael said, we should try to merge this pass with something else. A possible approach would be a peephole like optimization where we can plug more patterns.

Cheers,
-Quentin

lib/CodeGen/CalcSpillWeights.cpp
46–48	This seems wrong to export this interface. External users should rely on MachineRegisterInfo::getSimpleHint. What is the problem in using MachineRegisterInfo::getSimpleHint?

In D23253#522534, @qcolombet wrote:

Hi,

Thanks for actually reviewing this.

What is the performance impact (both runtime and compile time) of the pass?

Run-time impact, according to IACA, would be 1% to 40% (pre-ra-sched.ll) faster. Never worse. I'll measure the compile-time perf tonight. Do you have any suggestions on the preferred way to do this?

You list results for test/CodeGen/X86, which is not usually what we use for perform testing. Could you try on the LLVM test suite?

My measurements have thus far centered around the performance of generated code. Since this pass only ever runs when -march=x86/-64, there would be no difference in the rest of the arches, improvements or otherwise. If you really want, I could run the rest of the suite but the result would be a bunch of passes and xfails. What sort of results do you really want me to record here?

Moreover, like Michael said, we should try to merge this pass with something else. A possible approach would be a peephole like optimization where we can plug more patterns.

As I've mentioned before, this depends on virtual register LiveInterval information that is lost after the rewrite pass.

Cheers,
-Quentin

lib/CodeGen/CalcSpillWeights.cpp
46–48	In cases where there is more than one hint, getSimpleHint only returns one and misses the rest.

Hi bryant,

I haven't found time to review this patch in detail yet, but here are some initial comments/questions.

Given some of your sample generated-code changes, I expected to see lots of changes in tests/CodeGen/X86. Are you planning to add those changes to this patch later?

Your output code examples have a few instances of this:

+ xorl %r10d, %r10d
+ movb 2(%rdi), %r10b

Rather than insert an xor here, you'd prefer to convert the movb to movzbl.

Converting movb to movzbl (and movw to movzwl) is essentially what FixupBWInstPass does. The author of that pass was deliberately aggressive about converting movw to movzwl but a bit more conservative about converting to movb to movzbl. Here are the relevant comments:

// Only replace 8 bit loads with the zero extending versions if
// in an inner most loop and not optimizing for size. This takes
// an extra byte to encode, and provides limited performance upside.

// Always try to replace 16 bit load with 32 bit zero extending.
// Code size is the same, and there is sometimes a perf advantage
// from eliminating a false dependence on the upper portion of
// the register.

This leads to 2 questions for me.

(1) What is the code size impact of this new pass?
(2) How does the behavior of this new pass compare to simply changing X86FixupBWInsts to always optimize the 8-bit case, i.e. not check for innermost loops?

Thanks,
Dave

In D23253#522637, @DavidKreitzer wrote:

Hi bryant,

I haven't found time to review this patch in detail yet, but here are some initial comments/questions.

Given some of your sample generated-code changes, I expected to see lots of changes in tests/CodeGen/X86. Are you planning to add those changes to this patch later?

Yes, definitely. Will include the updates to test/*.ll in my next diff
release.

Your output code examples have a few instances of this:

+ xorl %r10d, %r10d
+ movb 2(%rdi), %r10b

Rather than insert an xor here, you'd prefer to convert the movb to movzbl.

Just to clarify, the optimal result is movzbl 2(%rdi), %r10d, not

movb 2(%rdi), %al; movzbl %al, %r10d (which is the status quo)

nor

xorl %r10d, %r10d; movb 2(%rdi), %r10d (which is what this patch does)

Correct?

Converting movb to movzbl (and movw to movzwl) is essentially what FixupBWInstPass does. The author of that pass was deliberately aggressive about converting movw to movzwl but a bit more conservative about converting to movb to movzbl. Here are the relevant comments:
// Only replace 8 bit loads with the zero extending versions if
// in an inner most loop and not optimizing for size. This takes
// an extra byte to encode, and provides limited performance upside.

I find this comment strange. The stats on x86-64 are:

21:38:41 ~/3rd/llvm> python iacapipe.py --arch 64 a.out
# Throughput: 0.50; Uops: 2; Latency: 6; Size: 7
kuper:
    mov    0x2(%rdi),%al
    movzbl %al,%r10d

# Throughput: 0.50; Uops: 1; Latency: 5; Size: 7
this_patch:
    xor    %r10d,%r10d
    mov    0x2(%rdi),%r10b

# Throughput: 0.50; Uops: 1; Latency: 5; Size: 5
optimal:
    movzbl 0x2(%rdi),%r10d

And on x86-32:

21:39:32 ~/3rd/llvm> python iacapipe.py --arch 32 a.out
# Throughput: 0.50; Uops: 2; Latency: 6; Size: 6
kuper:
    mov    0x2(%edi),%al
    movzbl %al,%eax

# Throughput: 0.50; Uops: 1; Latency: 5; Size: 5
this_patch:
    xor    %eax,%eax
    mov    0x2(%edi),%al

# Throughput: 0.50; Uops: 1; Latency: 5; Size: 4
optimal:
    movzbl 0x2(%edi),%eax

So the converted movzb to movzbl is smaller and has fewer micro-ops. Am I
missing something obvious?

// Always try to replace 16 bit load with 32 bit zero extending.
// Code size is the same, and there is sometimes a perf advantage
// from eliminating a false dependence on the upper portion of
// the register.

This leads to 2 questions for me.

(1) What is the code size impact of this new pass?

I'll compile a nice histogram from the diff data in
https://reviews.llvm.org/P7213 and https://reviews.llvm.org/P7214 .

(2) How does the behavior of this new pass compare to simply changing X86FixupBWInsts to always optimize the 8-bit case, i.e. not check for innermost loops?

If the above results from IACA are to be trusted, it would be better to always
convert "movb; movzbl" to plain old movzbl. Currently, this pass would pre-empt
X86FixupBWInsts by transforming it into "xor; movb" before the later pass has a
chance to see the movb-movzbl pattern. It would be easy to add special cases
patterns that this pass should leave untouched.

Thanks,
Dave

Please let me know if additional clarification is needed.

bryant added inline comments.Aug 22 2016, 7:16 PM

lib/CodeGen/CalcSpillWeights.cpp

46–48

Here is an example:

c
#include <stdint.h>

typedef struct {
  uint32_t low;
  uint32_t high;
} u32pair;

extern void use8(uint8_t);

uint32_t f(uint64_t *m, uint32_t a, uint32_t b) {
  u32pair src = {a > 0, b > 0};
  use8(a > 0);
  u32pair rv;
  asm volatile("cmpxchg8b %2"
               : "+a"(rv.low), "+d"(rv.high), "+m"(m)
               : "b"(src.low), "c"(src.high)
               : "flags");
  return rv.low > 0 && rv.high > 0;
}

pre-regalloc machineinstrs:

0B      BB#0: derived from LLVM BB %3
            Live Ins: %RDI %ESI %EDX
16B             %vreg2<def> = COPY %EDX; GR32:%vreg2
32B             %vreg1<def> = COPY %ESI; GR32:%vreg1
48B             %vreg0<def> = COPY %RDI; GR64:%vreg0
64B             MOV64mr <fi#0>, 1, %noreg, 0, %noreg, %vreg0; mem:ST8[%4](tbaa=!2) GR64:%vreg0
80B             TEST32rr %vreg1, %vreg1, %EFLAGS<imp-def>; GR32:%vreg1
96B             %vreg5<def> = SETNEr %EFLAGS<imp-use,kill>; GR8:%vreg5
112B            %vreg6<def> = MOVZX32rr8 %vreg5; GR32:%vreg6 GR8:%vreg5
128B            TEST32rr %vreg2, %vreg2, %EFLAGS<imp-def>; GR32:%vreg2
144B            %vreg7<def> = SETNEr %EFLAGS<imp-use,kill>; GR8:%vreg7
160B            %vreg8<def> = MOVZX32rr8 %vreg7; GR32:%vreg8 GR8:%vreg7
176B            ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
192B   ==>      %EDI<def> = COPY %vreg6; GR32:%vreg6
208B            CALL64pcrel32 <ga:@use8>, <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %RSP<imp-use>, %EDI<imp-use>, %RSP<imp-def>
224B            ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
240B   ==>      %EBX<def> = COPY %vreg6; GR32:%vreg6
256B            %ECX<def> = COPY %vreg8; GR32:%vreg8
272B            INLINEASM <es:cmpxchg8b $2> [sideeffect] [mayload] [maystore] [attdialect], $0:[regdef], %EAX<imp-def,tied>, $1:[regdef], %EDX<imp-def,tied>, $2:[mem:m], <fi#0>, 1, %noreg, 0, %noreg, $3:[reguse], %EBX<kill>, $4:[reguse], %ECX<kill>, $5:[reguse tiedto:$0], %EAX<undef,tied3>, $6:[reguse tiedto:$1], %EDX<undef,tied5>, $7:[mem:m], <fi#0>, 1, %noreg, 0, %noreg, $8:[clobber], %EFLAGS<earlyclobber,imp-def,dead>, $9:[clobber], %EFLAGS<earlyclobber,imp-def,dead>, <!5>
276B            %vreg12<def> = COPY %EDX; GR32:%vreg12
280B            %vreg11<def> = COPY %EAX; GR32:%vreg11
320B            TEST32rr %vreg11, %vreg11, %EFLAGS<imp-def>; GR32:%vreg11
336B            %vreg13<def> = SETNEr %EFLAGS<imp-use,kill>; GR8:%vreg13
352B            TEST32rr %vreg12, %vreg12, %EFLAGS<imp-def>; GR32:%vreg12
368B            %vreg15<def> = SETNEr %EFLAGS<imp-use,kill>; GR8:%vreg15
400B            %vreg15<def,tied1> = AND8rr %vreg15<tied0>, %vreg13, %EFLAGS<imp-def,dead>; GR8:%vreg15,%vreg13
416B            %vreg16<def> = MOVZX32rr8 %vreg15; GR32:%vreg16 GR8:%vreg15
432B            %EAX<def> = COPY %vreg16; GR32:%vreg16
448B            RET 0, %EAX

%vreg6 has two hints: EDI and EBX. However, getSimpleHint shows that
MachineRegisterInfo only records EDI:

selectOrSplit GR32:%vreg6 [112r,240r:0)  0@112r w=5.738636e-03
hints: %EDI
missed hint %EDI

Just to clarify, the optimal result is movzbl 2(%rdi), %r10d, not

movb 2(%rdi), %al; movzbl %al, %r10d (which is the status quo)

nor

xorl %r10d, %r10d; movb 2(%rdi), %r10d (which is what this patch does)

Correct?

Yes, exactly.

So the converted movzb to movzbl is smaller and has fewer micro-ops. Am I
missing something obvious?

AHA! Yes, you are right that this conversion is always profitable:

mov    0x2(%edi),%al
movzbl %al,%eax

movzbl 0x2(%edi),%eax

but the X86FixupBWInsts pass will also do this:

mov    0x2(%edi),%al

movzbl 0x2(%edi),%eax

In other words, it will convert the movb to movzbl even if the result of the movb is never zero extended. Doing so is often a win for performance due to the partial register dependence of the movb, but movzbl does encode larger.

I assume you are saying that your new pass will only transform 8-bit moves that are subsequently zero extended? In that case, you can disregard my code size concern for now.

Ultimately, we will want to conditionally optimize the 8-bit instructions that do not feed into a zero extend. (That applies both to the movb --> movzbl transformation and to inserting xor before instructions like SETcc.) It is just a bit harder, due to the tradeoff that needs to be made. I made the same comment on Michael's FixupSetCC pass - just haven't gotten around to experimenting with it yet.

I'll compile a nice histogram from the diff data in
https://reviews.llvm.org/P7213 and https://reviews.llvm.org/P7214 .

That would be very nice! Like I said, though, I am less worried about the code size impact if you are only optimizing instructions that are already being zero extended.

This does mean, however, that this pass will not be an adequate replacement for FixupBWInsts until we add the raw movb --> movzbl transform.

Thanks,
Dave

Revision Contents

Path

Size

include/

llvm/

CodeGen/

CalcSpillWeights.h

4 lines

lib/

CodeGen/

CalcSpillWeights.cpp

6 lines

Target/

X86/

1 line

2 lines

695 lines

20 lines

Diff 68285

include/llvm/CodeGen/CalcSpillWeights.h

Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	public:
VirtRegAuxInfo(MachineFunction &mf, LiveIntervals &lis,		VirtRegAuxInfo(MachineFunction &mf, LiveIntervals &lis,
VirtRegMap *vrm, const MachineLoopInfo &loops,		VirtRegMap *vrm, const MachineLoopInfo &loops,
const MachineBlockFrequencyInfo &mbfi,		const MachineBlockFrequencyInfo &mbfi,
NormalizingFn norm = normalizeSpillWeight)		NormalizingFn norm = normalizeSpillWeight)
: MF(mf), LIS(lis), VRM(vrm), Loops(loops), MBFI(mbfi), normalize(norm) {}		: MF(mf), LIS(lis), VRM(vrm), Loops(loops), MBFI(mbfi), normalize(norm) {}

/// \brief (re)compute li's spill weight and allocation hint.		/// \brief (re)compute li's spill weight and allocation hint.
void calculateSpillWeightAndHint(LiveInterval &li);		void calculateSpillWeightAndHint(LiveInterval &li);

		static unsigned copyHint(const MachineInstr *, unsigned,
		const TargetRegisterInfo &,
		const MachineRegisterInfo &);
};		};

/// \brief Compute spill weights and allocation hints for all virtual register		/// \brief Compute spill weights and allocation hints for all virtual register
/// live intervals.		/// live intervals.
void calculateSpillWeightsAndHints(LiveIntervals &LIS, MachineFunction &MF,		void calculateSpillWeightsAndHints(LiveIntervals &LIS, MachineFunction &MF,
VirtRegMap *VRM,		VirtRegMap *VRM,
const MachineLoopInfo &MLI,		const MachineLoopInfo &MLI,
const MachineBlockFrequencyInfo &MBFI,		const MachineBlockFrequencyInfo &MBFI,
VirtRegAuxInfo::NormalizingFn norm =		VirtRegAuxInfo::NormalizingFn norm =
normalizeSpillWeight);		normalizeSpillWeight);
}		}

#endif // LLVM_CODEGEN_CALCSPILLWEIGHTS_H		#endif // LLVM_CODEGEN_CALCSPILLWEIGHTS_H

lib/CodeGen/CalcSpillWeights.cpp

Show All 37 Lines	for (unsigned i = 0, e = MRI.getNumVirtRegs(); i != e; ++i) {
unsigned Reg = TargetRegisterInfo::index2VirtReg(i);		unsigned Reg = TargetRegisterInfo::index2VirtReg(i);
if (MRI.reg_nodbg_empty(Reg))		if (MRI.reg_nodbg_empty(Reg))
continue;		continue;
VRAI.calculateSpillWeightAndHint(LIS.getInterval(Reg));		VRAI.calculateSpillWeightAndHint(LIS.getInterval(Reg));
}		}
}		}

// Return the preferred allocation register for reg, given a COPY instruction.		// Return the preferred allocation register for reg, given a COPY instruction.
static unsigned copyHint(const MachineInstr *mi, unsigned reg,		unsigned VirtRegAuxInfo::copyHint(const MachineInstr *mi, unsigned reg,
const TargetRegisterInfo &tri,		const TargetRegisterInfo &tri,
const MachineRegisterInfo &mri) {		const MachineRegisterInfo &mri) {
		bryantAuthorUnsubmitted Not Done Reply Inline Actions All this does is permit access to `copyHint` to the general public. Please advise if this belongs in a separate differential. bryant: All this does is permit access to `copyHint` to the general public. Please advise if this…
		qcolombetUnsubmitted Not Done Reply Inline Actions This seems wrong to export this interface. External users should rely on MachineRegisterInfo::getSimpleHint. What is the problem in using MachineRegisterInfo::getSimpleHint? qcolombet: This seems wrong to export this interface. External users should rely on MachineRegisterInfo…
		bryantAuthorUnsubmitted Not Done Reply Inline Actions In cases where there is more than one hint, getSimpleHint only returns one and misses the rest. bryant: In cases where there is more than one hint, getSimpleHint only returns one and misses the rest.
		bryantAuthorUnsubmitted Not Done Reply Inline Actions Here is an example: c #include <stdint.h> typedef struct { uint32_t low; uint32_t high; } u32pair; extern void use8(uint8_t); uint32_t f(uint64_t m, uint32_t a, uint32_t b) { u32pair src = {a > 0, b > 0}; use8(a > 0); u32pair rv; asm volatile("cmpxchg8b %2" : "+a"(rv.low), "+d"(rv.high), "+m"(m) : "b"(src.low), "c"(src.high) : "flags"); return rv.low > 0 && rv.high > 0; } pre-regalloc machineinstrs: 0B BB#0: derived from LLVM BB %3 Live Ins: %RDI %ESI %EDX 16B %vreg2<def> = COPY %EDX; GR32:%vreg2 32B %vreg1<def> = COPY %ESI; GR32:%vreg1 48B %vreg0<def> = COPY %RDI; GR64:%vreg0 64B MOV64mr <fi#0>, 1, %noreg, 0, %noreg, %vreg0; mem:ST8[%4](tbaa=!2) GR64:%vreg0 80B TEST32rr %vreg1, %vreg1, %EFLAGS<imp-def>; GR32:%vreg1 96B %vreg5<def> = SETNEr %EFLAGS<imp-use,kill>; GR8:%vreg5 112B %vreg6<def> = MOVZX32rr8 %vreg5; GR32:%vreg6 GR8:%vreg5 128B TEST32rr %vreg2, %vreg2, %EFLAGS<imp-def>; GR32:%vreg2 144B %vreg7<def> = SETNEr %EFLAGS<imp-use,kill>; GR8:%vreg7 160B %vreg8<def> = MOVZX32rr8 %vreg7; GR32:%vreg8 GR8:%vreg7 176B ADJCALLSTACKDOWN64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use> 192B ==> %EDI<def> = COPY %vreg6; GR32:%vreg6 208B CALL64pcrel32 <ga:@use8>, <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %RSP<imp-use>, %EDI<imp-use>, %RSP<imp-def> 224B ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use> 240B ==> %EBX<def> = COPY %vreg6; GR32:%vreg6 256B %ECX<def> = COPY %vreg8; GR32:%vreg8 272B INLINEASM <es:cmpxchg8b $2> [sideeffect] [mayload] [maystore] [attdialect], $0:[regdef], %EAX<imp-def,tied>, $1:[regdef], %EDX<imp-def,tied>, $2:[mem:m], <fi#0>, 1, %noreg, 0, %noreg, $3:[reguse], %EBX<kill>, $4:[reguse], %ECX<kill>, $5:[reguse tiedto:$0], %EAX<undef,tied3>, $6:[reguse tiedto:$1], %EDX<undef,tied5>, $7:[mem:m], <fi#0>, 1, %noreg, 0, %noreg, $8:[clobber], %EFLAGS<earlyclobber,imp-def,dead>, $9:[clobber], %EFLAGS<earlyclobber,imp-def,dead>, <!5> 276B %vreg12<def> = COPY %EDX; GR32:%vreg12 280B %vreg11<def> = COPY %EAX; GR32:%vreg11 320B TEST32rr %vreg11, %vreg11, %EFLAGS<imp-def>; GR32:%vreg11 336B %vreg13<def> = SETNEr %EFLAGS<imp-use,kill>; GR8:%vreg13 352B TEST32rr %vreg12, %vreg12, %EFLAGS<imp-def>; GR32:%vreg12 368B %vreg15<def> = SETNEr %EFLAGS<imp-use,kill>; GR8:%vreg15 400B %vreg15<def,tied1> = AND8rr %vreg15<tied0>, %vreg13, %EFLAGS<imp-def,dead>; GR8:%vreg15,%vreg13 416B %vreg16<def> = MOVZX32rr8 %vreg15; GR32:%vreg16 GR8:%vreg15 432B %EAX<def> = COPY %vreg16; GR32:%vreg16 448B RET 0, %EAX %vreg6 has two hints: EDI and EBX. However, getSimpleHint shows that MachineRegisterInfo only records EDI: selectOrSplit GR32:%vreg6 [112r,240r:0) 0@112r w=5.738636e-03 hints: %EDI missed hint %EDI bryant:* Here is an example: ```c #include <stdint.h> typedef struct { uint32_t low; uint32_t high…
unsigned sub, hreg, hsub;		unsigned sub, hreg, hsub;
if (mi->getOperand(0).getReg() == reg) {		if (mi->getOperand(0).getReg() == reg) {
sub = mi->getOperand(0).getSubReg();		sub = mi->getOperand(0).getSubReg();
hreg = mi->getOperand(1).getReg();		hreg = mi->getOperand(1).getReg();
hsub = mi->getOperand(1).getSubReg();		hsub = mi->getOperand(1).getSubReg();
} else {		} else {
sub = mi->getOperand(1).getSubReg();		sub = mi->getOperand(1).getSubReg();
hreg = mi->getOperand(0).getReg();		hreg = mi->getOperand(0).getReg();
▲ Show 20 Lines • Show All 180 Lines • Show Last 20 Lines

lib/Target/X86/CMakeLists.txt

	Show All 14 Lines
	set(sources			set(sources
	X86AsmPrinter.cpp			X86AsmPrinter.cpp
	X86CallFrameOptimization.cpp			X86CallFrameOptimization.cpp
	X86ExpandPseudo.cpp			X86ExpandPseudo.cpp
	X86FastISel.cpp			X86FastISel.cpp
	X86FixupBWInsts.cpp			X86FixupBWInsts.cpp
	X86FixupLEAs.cpp			X86FixupLEAs.cpp
	X86FixupSetCC.cpp			X86FixupSetCC.cpp
				X86FixupZExt.cpp
	X86FloatingPoint.cpp			X86FloatingPoint.cpp
	X86FrameLowering.cpp			X86FrameLowering.cpp
	X86ISelDAGToDAG.cpp			X86ISelDAGToDAG.cpp
	X86ISelLowering.cpp			X86ISelLowering.cpp
	X86InstrInfo.cpp			X86InstrInfo.cpp
	X86MCInstLower.cpp			X86MCInstLower.cpp
	X86MachineFunctionInfo.cpp			X86MachineFunctionInfo.cpp
	X86OptimizeLEAs.cpp			X86OptimizeLEAs.cpp
	Show All 21 Lines

lib/Target/X86/X86.h

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines

	/// Return a pass that removes redundant LEA instructions and redundant address			/// Return a pass that removes redundant LEA instructions and redundant address
	/// recalculations.			/// recalculations.
	FunctionPass *createX86OptimizeLEAs();			FunctionPass *createX86OptimizeLEAs();

	/// Return a pass that transforms setcc + movzx pairs into xor + setcc.			/// Return a pass that transforms setcc + movzx pairs into xor + setcc.
	FunctionPass *createX86FixupSetCC();			FunctionPass *createX86FixupSetCC();

				FunctionPass *createX86FixupZExt();

	/// Return a pass that expands WinAlloca pseudo-instructions.			/// Return a pass that expands WinAlloca pseudo-instructions.
	FunctionPass *createX86WinAllocaExpander();			FunctionPass *createX86WinAllocaExpander();

	/// Return a pass that optimizes the code-size of x86 call sequences. This is			/// Return a pass that optimizes the code-size of x86 call sequences. This is
	/// done by replacing esp-relative movs with pushes.			/// done by replacing esp-relative movs with pushes.
	FunctionPass *createX86CallFrameOptimization();			FunctionPass *createX86CallFrameOptimization();

	/// Return an IR pass that inserts EH registration stack objects and explicit			/// Return an IR pass that inserts EH registration stack objects and explicit
	Show All 20 Lines

lib/Target/X86/X86FixupZExt.cpp

This file was added.

				#include "X86Subtarget.h"
				#include "llvm/CodeGen/CalcSpillWeights.h"
				#include "llvm/CodeGen/LiveIntervalAnalysis.h"
				#include "llvm/CodeGen/LivePhysRegs.h"
				#include "llvm/CodeGen/LiveRegMatrix.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/CodeGen/RegisterClassInfo.h"
				#include "llvm/CodeGen/VirtRegMap.h"

				#include <iterator>

				#define DEBUG_TYPE "x86-fixup-zext"

				namespace {
				using namespace llvm;
				using std::unique_ptr;
				using std::vector;
				using std::pair;
				using std::copy_if;
				using Segment = LiveRange::Segment;

				template <typename Elem, typename Container>
				using is_iterable_of = typename std::enable_if<std::is_same<
				typename std::decay<decltype(*std::declval<Container>().begin())>::type,
				Elem>::value>::type;

				unsigned getPhys(unsigned Reg, const VirtRegMap &VRM) {
				return TargetRegisterInfo::isVirtualRegister(Reg) ? VRM.getPhys(Reg) : Reg;
				}

				unsigned getPhys(const MachineOperand &RegOp, const VirtRegMap &VRM) {
				const MachineFunction &MF = *RegOp.getParent()->getParent()->getParent();
				const TargetRegisterInfo &TRI = *MF.getSubtarget().getRegisterInfo();
				assert(RegOp.isReg());
				unsigned PReg = getPhys(RegOp.getReg(), VRM);
				return RegOp.getSubReg() ? TRI.getSubReg(PReg, RegOp.getSubReg()) : PReg;
				}

				unsigned getPhys(const MachineInstr &MI, unsigned Op, const VirtRegMap &VRM) {
				return getPhys(MI.getOperand(Op), VRM);
				}

				void eraseInstr(MachineInstr &MI, LiveIntervals &LI) {
				LI.RemoveMachineInstrFromMaps(MI);
				MI.eraseFromParent();
				}

				DenseMap<MachineBasicBlock , MachineInstr >
				dominatingDefs(unsigned GR8, const MachineRegisterInfo &MRI,
				const SlotIndexes &SI) {
				DenseMap<MachineBasicBlock , MachineInstr > Defs;
				// at least until release_37, getInstructionIndex is expensive.
				DenseMap<MachineBasicBlock *, SlotIndex> Cached;

				for (MachineInstr &Def : MRI.def_instructions(GR8)) {
				unsigned TiedUse;
				if (Def.isRegTiedToUseOperand(0, &TiedUse) &&
				Def.getOperand(TiedUse).getReg() != Def.getOperand(0).getReg()) {
				DEBUG(dbgs() << "dominatingDefs: " << Def.getOperand(0) << " is tied to "
				<< Def.getOperand(TiedUse) << "\n");
				return dominatingDefs(Def.getOperand(TiedUse).getReg(), MRI, SI);
				}
				MachineBasicBlock *MBB = Def.getParent();
				if (Defs.find(MBB) == Defs.end() \|\|
				SI.getInstructionIndex(Def) < Cached.lookup(MBB)) {
				Cached[MBB] = SI.getInstructionIndex(Def);
				Defs[MBB] = &Def;
				}
				}
				return Defs;
				}

				void addSeg(SlotIndex s, SlotIndex e, LiveInterval &Live, LiveIntervals &LI) {
				assert(Live.vni_begin() != Live.vni_end());
				Live.addSegment(Segment(std::move(s), std::move(e), *Live.vni_begin()));
				}

				void addSeg(MachineInstr &s, MachineInstr &e, LiveInterval &Live,
				LiveIntervals &LI) {
				return addSeg(LI.getInstructionIndex(s), LI.getInstructionIndex(e), Live, LI);
				}

				void addSegs(LiveInterval &Src, LiveInterval &Dest, LiveIntervals &LI) {
				for (const Segment &s : Src) {
				addSeg(s.start, s.end, Dest, LI);
				}
				}

				bool mov32r0Segs(MachineInstr &Def8,
				SmallVectorImpl<pair<MachineInstr , MachineInstr >> &Segs,
				LiveIntervals &LI) {
				const MachineFunction &MF = *Def8.getParent()->getParent();
				const TargetRegisterInfo &TRI = *MF.getSubtarget().getRegisterInfo();
				MachineBasicBlock &MBB = *Def8.getParent();
				MachineBasicBlock::iterator Ins = &Def8;

				if (const Segment *eflagseg =
				LI.getRegUnit(*MCRegUnitIterator(X86::EFLAGS, &TRI))
				.getSegmentContaining(LI.getInstructionIndex(Def8))) {
				if (eflagseg->start <= LI.getInstructionIndex(*MBB.begin()) &&
				MBB.isLiveIn(X86::EFLAGS)) {
				if (MBB.pred_size() > 1) {
				return false;
				}
				Segs.push_back(std::make_pair(&*MBB.begin(), &Def8));
				return mov32r0Segs((MBB.pred_begin())->rbegin(), Segs, LI);
				}
				Ins = LI.getInstructionFromIndex(eflagseg->start);
				}
				Segs.push_back(std::make_pair(&*Ins, &Def8));
				return true;
				}

				template <typename T, typename = is_iterable_of<LiveInterval *, T>>
				raw_ostream &operator<<(raw_ostream &out, const T &Lives) {
				for (LiveInterval *e : Lives) {
				out << "\t" << (*e) << "\n";
				}
				return out;
				}

				template <typename T, typename = is_iterable_of<LiveInterval *, T>>
				bool interferes(const T &as, const LiveInterval &b,
				const MachineRegisterInfo &MRI) {
				return any_of(as, [&](const LiveInterval *a) { return a->overlaps(b); });
				}

				struct ReAllocTool {
				const TargetRegisterInfo *TRI;
				const MachineRegisterInfo *MRI;
				LiveRegMatrix *LRM;
				VirtRegMap *VRM;
				RegisterClassInfo RCI;
				BitVector unused_csr;

				void addRegToBitVec(BitVector &BV, MCPhysReg Reg) const {
				for (MCRegAliasIterator r(Reg, TRI, true); r.isValid(); ++r) {
				BV.set(*r);
				}
				}

				BitVector bitVecFromRegs(ArrayRef<MCPhysReg> Regs) const {
				BitVector rv(TRI->getNumRegs());
				for (const MCPhysReg &r : Regs) {
				addRegToBitVec(rv, r);
				}
				return rv;
				}

				ReAllocTool(const MachineFunction &MF, LiveRegMatrix &LRM_, VirtRegMap &VRM_)
				: TRI(MF.getSubtarget().getRegisterInfo()), MRI(&MF.getRegInfo()),
				LRM(&LRM_), VRM(&VRM_), RCI(), unused_csr(TRI->getNumRegs()) {
				const MCPhysReg *csr = TRI->getCalleeSavedRegs(&MF);
				for (unsigned i = 0; csr[i] != 0; i += 1) {
				if (!LRM->isPhysRegUsed(csr[i])) {
				addRegToBitVec(unused_csr, csr[i]);
				}
				}
				RCI.runOnMachineFunction(MF);
				}

				bool interf(LiveInterval &Live, unsigned PReg) const {
				return LRM->checkInterference(Live, PReg) != LiveRegMatrix::IK_Free;
				}

				template <typename T, typename = is_iterable_of<LiveInterval *, T>>
				bool interf(LiveInterval &Live, unsigned PReg, T &Evictees) const {
				if (LRM->checkRegMaskInterference(Live, PReg) \|\|
				LRM->checkRegUnitInterference(Live, PReg)) {
				return true;
				}
				DenseSet<LiveInterval *> UniqueEv;
				for (MCRegUnitIterator regunit(PReg, TRI); regunit.isValid(); ++regunit) {
				LiveIntervalUnion::Query &q = LRM->query(Live, *regunit);
				if (q.collectInterferingVRegs() > 0) {
				for (LiveInterval *l : q.interferingVRegs()) {
				UniqueEv.insert(l);
				}
				}
				}
				std::copy(UniqueEv.begin(), UniqueEv.end(), back_inserter(Evictees));
				return Evictees.size() > 0;
				}

				const MCPhysReg *allocNext(LiveInterval &Live,
				const BitVector *Except = nullptr,
				ArrayRef<MCPhysReg>::iterator *it = nullptr,
				const TargetRegisterClass *TRC = nullptr) const {
				ArrayRef<MCPhysReg> Ord =
				RCI.getOrder(TRC ? TRC : MRI->getRegClass(Live.reg));
				BitVector rs = unused_csr;
				if (Except) {
				rs \|= *Except;
				}
				auto rv = std::find_if(
				it ? std::next(*it) : Ord.begin(), Ord.end(),
				[&](MCPhysReg r) { return !rs.test(r) && !interf(Live, r); });
				return rv == Ord.end() ? nullptr : rv;
				}

				MCPhysReg alloc(LiveInterval &Live, const BitVector *Except = nullptr,
				const TargetRegisterClass *TRC = nullptr) const {
				const MCPhysReg *rv = allocNext(Live, Except, nullptr, TRC);
				return rv ? *rv : 0;
				}

				// (re-)allocate a group of interfering intervals. brute force search. returns
				// nullptr if impossible.
				template <typename C, typename = is_iterable_of<LiveInterval *, C>>
				unique_ptr<vector<pair<LiveInterval , const MCPhysReg >>>
				allocInterfIntervals(C Group, const BitVector *Except = nullptr) const {
				if (Group.empty()) {
				return make_unique<vector<pair<LiveInterval , const MCPhysReg >>>();
				}
				auto Assigned =
				make_unique<vector<pair<LiveInterval , const MCPhysReg >>>();

				auto maybeUnassign = [&](pair<LiveInterval , const MCPhysReg > &P) {
				if (P.second) {
				LRM->unassign(*P.first);
				}
				};

				auto maybeAssign = [&](pair<LiveInterval , const MCPhysReg > &P) {
				if (P.second) {
				LRM->assign(P.first, P.second);
				}
				};

				auto tryNextInGroup = [&]() {
				assert(!Group.empty());
				Assigned->push_back(
				std::make_pair(Group.back(), allocNext(*Group.back(), Except)));
				Group.pop_back();
				maybeAssign(Assigned->back());
				};

				auto backToPrevious = [&]() {
				assert(!Assigned->empty());
				maybeUnassign(Assigned->back());
				Group.push_back(Assigned->back().first);
				Assigned->pop_back();
				};

				auto tryNextReg = [&]() {
				assert(!Assigned->empty());
				maybeUnassign(Assigned->back());
				Assigned->back().second =
				allocNext(*Assigned->back().first, Except, &Assigned->back().second);
				maybeAssign(Assigned->back());
				};

				tryNextInGroup();

				while (!Group.empty() \|\| !Assigned->back().second) {
				if (Assigned->back().second) {
				tryNextInGroup();
				} else {
				backToPrevious();
				if (Assigned->empty()) {
				return nullptr;
				}
				tryNextReg();
				}
				}
				for (pair<LiveInterval, const MCPhysReg> &P : *Assigned) {
				LRM->unassign(*P.first);
				}
				return Assigned;
				}

				// allocate a set of intervals (that may or may not interfere with each other)
				// such that none of their allocations interfere with the registers in
				// `Excepts`. return nullptr if no such allocation is possible.
				template <typename C, typename = is_iterable_of<LiveInterval *, C>>
				unique_ptr<vector<MCPhysReg>>
				allocIntervals(const C &Lives, const BitVector *Excepts = nullptr) const {
				DenseMap<LiveInterval , const MCPhysReg > NewAssigns;
				vector<LiveInterval *> Ungrouped(Lives.begin(), Lives.end());

				// partition into groups interfering intervals; allocate group-by-group.
				while (!Ungrouped.empty()) {
				vector<LiveInterval *> Group;
				Group.push_back(Ungrouped.back());
				Ungrouped.pop_back();
				bool Done = false;
				while (!Done) {
				auto it = std::partition(
				Ungrouped.begin(), Ungrouped.end(),
				[&](LiveInterval h) { return !interferes(Group, h, *MRI); });
				Done = it == Ungrouped.end();
				std::copy(it, Ungrouped.end(), back_inserter(Group));
				Ungrouped.erase(it, Ungrouped.end());
				}
				if (auto n = allocInterfIntervals(Group, Excepts)) {
				for (pair<LiveInterval , const MCPhysReg > pair_ : *n) {
				NewAssigns.insert(pair_);
				}
				} else {
				return nullptr;
				}
				}
				auto rv = make_unique<vector<MCPhysReg>>();
				transform(Lives, back_inserter(*rv),
				[&](LiveInterval l) { return NewAssigns[l]; });
				return rv;
				}

				MCPhysReg unassign(LiveInterval &Live) {
				unsigned Old = getPhys(Live.reg, *VRM);
				LRM->unassign(Live);
				return Old;
				}

				template <typename C, typename = is_iterable_of<LiveInterval *, C>>
				vector<MCPhysReg> unassignAll(C &Lives) {
				vector<MCPhysReg> OldRegs;
				transform(Lives, back_inserter(OldRegs),
				[&](LiveInterval l) { return unassign(l); });
				return OldRegs;
				}

				template <typename C, typename D,
				typename = is_iterable_of<LiveInterval *, C>,
				typename = is_iterable_of<MCPhysReg, D>>
				void assignAll(C &Lives, D &&Regs) {
				for (auto intv_reg : zip_first(Lives, std::forward<D>(Regs))) {
				LRM->assign(*std::get<0>(intv_reg), std::get<1>(intv_reg));
				}
				}

				bool reservePhysReg(MCPhysReg PReg, LiveInterval &Live) {
				vector<LiveInterval *> Evictees;
				if (!interf(Live, PReg, Evictees)) {
				DEBUG(dbgs() << "ReAllocTool: " << TRI->getName(PReg)
				<< " is already free.\n");
				return true;
				} else if (Evictees.size() > 0) {
				DEBUG(dbgs() << "ReAllocTool: trying to reserve " << TRI->getName(PReg)
				<< " by evicting:\n"
				<< Evictees);
				vector<MCPhysReg> OldRegs = unassignAll(Evictees);
				BitVector bv = bitVecFromRegs(PReg);
				if (auto NewRegs = allocIntervals(Evictees, &bv)) {
				assignAll(Evictees, *NewRegs);
				return true;
				}
				assignAll(Evictees, OldRegs);
				}
				DEBUG(dbgs() << "ReAllocTool: unable to reserve " << TRI->getName(PReg)
				<< "\n");
				return false;
				}
				};

				struct Candidate {
				MachineInstr *ins;
				MachineInstr *gr8def;
				MachineInstr *movzx;
				LiveIntervals *LI;
				LiveInterval *live32;
				LiveInterval *live8;
				unique_ptr<LiveInterval> extra;
				vector<MCPhysReg> constraints;

				private:
				// used to cache originally assigned registers for live32 and live8.
				unsigned pdest;
				unsigned psrc;

				public:
				static MachineInstr *validCandidate(MachineInstr &MI, LiveIntervals &LI) {
				if (MI.getOpcode() != X86::MOVZX32rr8 \|\|
				MI.getOperand(1).getSubReg() != 0) {
				return nullptr;
				}

				const MachineFunction &MF = *MI.getParent()->getParent();
				const MachineRegisterInfo &MRI = MF.getRegInfo();
				const TargetRegisterInfo &TRI = *MF.getSubtarget().getRegisterInfo();

				unsigned Src = MI.getOperand(1).getReg();
				auto bbdefs = dominatingDefs(Src, MRI, *LI.getSlotIndexes());
				if (bbdefs.size() > 1 \|\| (MRI.getSimpleHint(Src) &&
				!TRI.isVirtualRegister(MRI.getSimpleHint(Src)))) {
				DEBUG(dbgs() << "passing over " << MI << "defs: " << bbdefs.size()
				<< ", gr8 hint: " << PrintReg(MRI.getSimpleHint(Src), &TRI)
				<< "\n");
				return nullptr;
				}
				return bbdefs.begin()->second;
				}

				Candidate(const SmallVectorImpl<pair<MachineInstr , MachineInstr >> &Segs,
				MachineInstr &GR8, MachineInstr &Movzx_,
				const TargetRegisterInfo &TRI, LiveIntervals &LI_)
				: ins(nullptr), gr8def(&GR8), movzx(&Movzx_), LI(&LI_),
				live32(&LI->getInterval(movzx->getOperand(0).getReg())),
				live8(&LI->getInterval(movzx->getOperand(1).getReg())),
				extra(new LiveInterval(live32->reg, live32->weight)), constraints() {
				MachineBasicBlock &MBB = *Segs.front().first->getParent();
				const TargetInstrInfo &TII =
				*MBB.getParent()->getSubtarget().getInstrInfo();
				ins = BuildMI(MBB, Segs.front().first, movzx->getDebugLoc(),
				TII.get(X86::MOV32r0), 0);
				LI->InsertMachineInstrInMaps(*ins);
				extra->getNextValue(LI->getInstructionIndex(*ins),
				LI->getVNInfoAllocator());
				addSeg(ins, Segs.front().second, extra, LI);
				for (auto P : make_range(std::next(Segs.begin()), Segs.end())) {
				addSeg(P.first, P.second, extra, LI);
				}
				}

				~Candidate() {
				if (ins) {
				eraseInstr(ins, LI);
				}
				}

				Candidate(Candidate &&C)
				: ins(C.ins), gr8def(C.gr8def), movzx(C.movzx), LI(C.LI),
				live32(C.live32), live8(C.live8), extra(std::move(C.extra)),
				constraints(std::move(C.constraints)) {
				C.ins = nullptr;
				}

				Candidate &operator=(Candidate &&C) {
				ins = C.ins;
				gr8def = C.gr8def;
				movzx = C.movzx;
				live32 = C.live32;
				live8 = C.live8;
				LI = C.LI;
				extra = std::move(C.extra);
				constraints = std::move(C.constraints);
				C.ins = nullptr;
				return *this;
				}

				static unique_ptr<Candidate> fromMI(MachineInstr &MI, LiveIntervals &LI,
				const VirtRegMap &VRM) {
				const MachineFunction &MF = *MI.getParent()->getParent();
				const MachineRegisterInfo &MRI = MF.getRegInfo();
				const TargetRegisterInfo &TRI = *MF.getSubtarget().getRegisterInfo();

				MachineInstr *Def8;
				SmallVector<pair<MachineInstr , MachineInstr >, 4> Segs;
				if (!(Def8 = validCandidate(MI, LI)) \|\| !mov32r0Segs(*Def8, Segs, LI)) {
				return nullptr;
				}

				Candidate C(Segs, *Def8, MI, TRI, LI);
				if (C.live32->overlaps(*C.extra)) {
				return nullptr;
				}

				addSegs(C.live32, C.extra, LI);
				addSegs(C.live8, C.extra, LI);

				unsigned Dest = MI.getOperand(0).getReg();
				// look for copy instr reg alloc hints
				for (const MachineInstr &Use : MRI.use_instructions(Dest)) {
				if (Use.isCopy()) {
				if (unsigned Hint =
				getPhys(VirtRegAuxInfo::copyHint(&Use, Dest, TRI, MRI), VRM)) {
				C.constraints.push_back(Hint);
				}
				}
				}

				// look for regmask constraints if no hints were found
				if (C.constraints.empty()) {
				BitVector RegMask;
				if (LI.checkRegMaskInterference(*C.extra, RegMask)) {
				const TargetRegisterClass &DestCls = *MRI.getRegClass(Dest);
				copy_if(DestCls.begin(), DestCls.end(), back_inserter(C.constraints),
				[&](MCPhysReg r) { return RegMask.test(r); });
				}
				}
				return make_unique<Candidate>(std::move(C));
				}

				bool operator<(const Candidate &C) const {
				if (constraints.size() > 0 && C.constraints.size() == 0)
				return true;
				if (C.constraints.size() > 0 && constraints.size() == 0)
				return false;
				if (constraints.size() < C.constraints.size())
				return true;
				return li_size() > C.li_size();
				}

				unsigned li_size() const { return extra->getSize(); }

				friend raw_ostream &operator<<(raw_ostream &out, const Candidate &C) {
				out << "Candidate:\n\tinserted: " << (*C.ins)
				<< "\tgr8 def: " << (C.gr8def) << "\tmovzx: " << (C.movzx)
				<< "\txor gr32: " << (*C.extra);
				if (C.constraints.size() > 0) {
				out << "\n\tconstraints:";
				for (unsigned cx : C.constraints) {
				out << " " << PrintReg(cx, &C.TRI());
				}
				} else {
				out << "\n\tno constraints.";
				}
				return out;
				}

				const X86RegisterInfo &TRI() const {
				return reinterpret_cast<const X86RegisterInfo >(
				ins->getParent()->getParent()->getSubtarget().getRegisterInfo());
				}

				const X86InstrInfo &TII() const {
				return reinterpret_cast<const X86InstrInfo >(
				ins->getParent()->getParent()->getSubtarget().getInstrInfo());
				}

				MachineRegisterInfo &MRI() const {
				return ins->getParent()->getParent()->getRegInfo();
				}

				void unassign(ReAllocTool &RATool) {
				pdest = RATool.unassign(*live32);
				psrc = RATool.unassign(*live8);
				}

				void assignOld(LiveRegMatrix &LRM) {
				LRM.assign(*live32, pdest);
				LRM.assign(*live8, psrc);
				pdest = psrc = 0;
				}

				void assignNew(LiveRegMatrix &LRM, MCPhysReg NewDest) {
				// VSrc uses => VDest:sub_8bit; insert VDest = mov32r0; del movzx
				unsigned VDest = movzx->getOperand(0).getReg();
				unsigned VSrc = movzx->getOperand(1).getReg();

				// in-place operand mutation would confuse defusechain_iterator
				vector<MachineOperand *> Ops;
				transform(MRI().reg_operands(VSrc), back_inserter(Ops),
				[](MachineOperand &Op) { return &Op; });
				for (MachineOperand *Op : Ops) {
				DEBUG(dbgs() << "changing " << (*Op->getParent()));
				Op->substVirtReg(VDest, X86::sub_8bit, TRI());
				DEBUG(dbgs() << "to " << (*Op->getParent()));
				}

				eraseInstr(movzx, LI);
				LI->removeInterval(VSrc);
				LI->removeInterval(VDest);

				const TargetRegisterClass &DestCls = *MRI().getRegClass(VDest);
				ins->getOperand(0).setReg(VDest);
				if (DestCls.getSize() > 32 / 8) {
				ins->getOperand(0).setSubReg(X86::sub_32bit);
				ins->getOperand(0).setIsUndef();
				}
				if (const TargetRegisterClass *NewCls = gr8def->getRegClassConstraintEffect(
				0, ins->getRegClassConstraintEffect(0, &DestCls, &TII(), &TRI()),
				&TII(), &TRI())) {
				DEBUG(dbgs() << "updating reg class from "
				<< TRI().getRegClassName(&DestCls) << " to "
				<< TRI().getRegClassName(NewCls) << "\n");
				MRI().setRegClass(VDest, NewCls);
				} else {
				DEBUG(dbgs() << "not updating reg class\n");
				}
				LI->removeInterval(VDest);
				LRM.assign(LI->createAndComputeVirtRegInterval(VDest), NewDest);
				ins = nullptr; // prevent erasure of mov32r0 by dtor
				}

				bool validDestReg(MCPhysReg PReg) const {
				return MRI().getRegClass(movzx->getOperand(0).getReg())->contains(PReg);
				}
				};

				struct X86FixupZExt : public MachineFunctionPass {
				static char id;

				X86FixupZExt() : MachineFunctionPass(id) {}

				const char *getPassName() const override {
				return "X86 Zero-Extension Fix-up";
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<LiveRegMatrix>();
				AU.addRequired<VirtRegMap>();
				AU.addRequired<LiveIntervals>();
				AU.setPreservesAll();
				return MachineFunctionPass::getAnalysisUsage(AU);
				}

				bool runOnMachineFunction(MachineFunction &MF) override {
				VirtRegMap &VRM = getAnalysis<VirtRegMap>();
				LiveIntervals &LI = getAnalysis<LiveIntervals>();
				LiveRegMatrix &LRM = getAnalysis<LiveRegMatrix>();
				vector<Candidate> Constrained, Cands;
				ReAllocTool RATool(MF, LRM, VRM);

				DEBUG(dbgs() << "analyzing " << MF.getName() << "'s movzxes.\n");
				for (MachineBasicBlock &MBB : MF) {
				for (MachineInstr &MI : MBB) {
				if (auto Cand = Candidate::fromMI(MI, LI, VRM)) {
				if (Cand->constraints.size() > 0) {
				Constrained.push_back(std::move(*Cand.release()));
				} else {
				Cands.push_back(std::move(*Cand.release()));
				}
				}
				}
				}

				BitVector NoSub8;
				if (MF.getSubtarget<X86Subtarget>().is64Bit()) {
				NoSub8 = RATool.bitVecFromRegs({X86::RIP});
				} else {
				NoSub8 = RATool.bitVecFromRegs(ArrayRef<MCPhysReg>(
				X86::GR32_ABCDRegClass.begin(), X86::GR32_ABCDRegClass.end()));
				NoSub8.flip();
				}

				auto reserveOneOf = [&](ArrayRef<MCPhysReg> Regs, const Candidate &C) {
				for (MCPhysReg PReg : Regs) {
				if (!NoSub8.test(PReg) && C.validDestReg(PReg) &&
				!RATool.unused_csr.test(PReg) &&
				RATool.reservePhysReg(PReg, *C.extra)) {
				return PReg;
				}
				}
				return static_cast<MCPhysReg>(0);
				};

				DEBUG(VRM.print(dbgs()));
				DEBUG(MF.print(dbgs(), LI.getSlotIndexes()));
				std::sort(Constrained.begin(), Constrained.end());
				for (Candidate &C : Constrained) {
				DEBUG(dbgs() << C << "\n");
				C.unassign(RATool);
				if (MCPhysReg NewReg = reserveOneOf(C.constraints, C)) {
				DEBUG(dbgs() << "works\n");
				C.assignNew(LRM, NewReg);
				} else {
				C.assignOld(LRM);
				if (none_of(C.constraints, [&](MCPhysReg PReg) {
				return PReg == getPhys(*C.movzx, 0, VRM);
				})) {
				// only demote if RA pass missed all hints
				C.constraints.clear();
				DEBUG(dbgs() << "demoting to unconstrained candidate\n");
				Cands.push_back(std::move(C));
				} else {
				DEBUG(dbgs() << "could not transform\n");
				}
				}
				}

				std::sort(Cands.begin(), Cands.end());
				for (Candidate &C : Cands) {
				DEBUG(dbgs() << C << "\n");
				C.unassign(RATool);
				MCPhysReg NewReg;
				if (!MF.getSubtarget<X86Subtarget>().is64Bit() &&
				((NewReg = RATool.alloc(*C.extra, &NoSub8)) != 0 \|\|
				(NewReg =
				reserveOneOf(ArrayRef<MCPhysReg>(X86::GR32_ABCDRegClass.begin(),
				X86::GR32_ABCDRegClass.end()),
				C)) != 0)) {
				DEBUG(dbgs() << "works\n");
				C.assignNew(LRM, NewReg);
				} else if (MF.getSubtarget<X86Subtarget>().is64Bit() &&
				(NewReg = RATool.alloc(*C.extra, &NoSub8)) != 0) {
				DEBUG(dbgs() << "works\n");
				C.assignNew(LRM, NewReg);
				} else {
				DEBUG(dbgs() << "could not transform\n");
				C.assignOld(LRM);
				}
				}
				return false;
				}
				};

				char X86FixupZExt::id = 0;
				}

				namespace llvm {
				FunctionPass *createX86FixupZExt() { return new X86FixupZExt(); }
				}

lib/Target/X86/X86TargetMachine.cpp

//===-- X86TargetMachine.cpp - Define TargetMachine for the X86 -----------===//		//===-- X86TargetMachine.cpp - Define TargetMachine for the X86 -----------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file defines the X86 specific subclass of TargetMachine.		// This file defines the X86 specific subclass of TargetMachine.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "X86TargetMachine.h"
#include "X86.h"		#include "X86.h"
		#include "X86TargetMachine.h"
#include "X86TargetObjectFile.h"		#include "X86TargetObjectFile.h"
#include "X86TargetTransformInfo.h"		#include "X86TargetTransformInfo.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/FormattedStream.h"		#include "llvm/Support/FormattedStream.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
using namespace llvm;		using namespace llvm;

static cl::opt<bool> EnableMachineCombinerPass("x86-machine-combiner",		static cl::opt<bool> EnableMachineCombinerPass("x86-machine-combiner",
cl::desc("Enable the machine combiner pass"),		cl::desc("Enable the machine combiner pass"),
cl::init(true), cl::Hidden);		cl::init(true), cl::Hidden);

		static cl::opt<bool> EnableSetCCFixup("setcc-fixup",
		cl::desc("Apply X86FixupSetCC"),
		cl::init(false), cl::Hidden);

namespace llvm {		namespace llvm {
void initializeWinEHStatePassPass(PassRegistry &);		void initializeWinEHStatePassPass(PassRegistry &);
}		}

extern "C" void LLVMInitializeX86Target() {		extern "C" void LLVMInitializeX86Target() {
// Register the target.		// Register the target.
RegisterTargetMachine<X86TargetMachine> X(TheX86_32Target);		RegisterTargetMachine<X86TargetMachine> X(TheX86_32Target);
RegisterTargetMachine<X86TargetMachine> Y(TheX86_64Target);		RegisterTargetMachine<X86TargetMachine> Y(TheX86_64Target);
▲ Show 20 Lines • Show All 193 Lines • ▼ Show 20 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

TargetIRAnalysis X86TargetMachine::getTargetIRAnalysis() {		TargetIRAnalysis X86TargetMachine::getTargetIRAnalysis() {
return TargetIRAnalysis([this](const Function &F) {		return TargetIRAnalysis([this](const Function &F) {
return TargetTransformInfo(X86TTIImpl(this, F));		return TargetTransformInfo(X86TTIImpl(this, F));
});		});
}		}


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Pass Pipeline Configuration		// Pass Pipeline Configuration
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {
/// X86 Code Generator Pass Configuration Options.		/// X86 Code Generator Pass Configuration Options.
class X86PassConfig : public TargetPassConfig {		class X86PassConfig : public TargetPassConfig {
public:		public:
X86PassConfig(X86TargetMachine *TM, PassManagerBase &PM)		X86PassConfig(X86TargetMachine *TM, PassManagerBase &PM)
: TargetPassConfig(TM, PM) {}		: TargetPassConfig(TM, PM) {}

X86TargetMachine &getX86TargetMachine() const {		X86TargetMachine &getX86TargetMachine() const {
return getTM<X86TargetMachine>();		return getTM<X86TargetMachine>();
}		}

void addIRPasses() override;		void addIRPasses() override;
bool addInstSelector() override;		bool addInstSelector() override;
bool addILPOpts() override;		bool addILPOpts() override;
bool addPreISel() override;		bool addPreISel() override;
void addPreRegAlloc() override;		void addPreRegAlloc() override;
void addPostRegAlloc() override;		void addPostRegAlloc() override;
		bool addPreRewrite() override;
void addPreEmitPass() override;		void addPreEmitPass() override;
void addPreSched2() override;		void addPreSched2() override;
};		};
} // namespace		} // namespace

TargetPassConfig *X86TargetMachine::createPassConfig(PassManagerBase &PM) {		TargetPassConfig *X86TargetMachine::createPassConfig(PassManagerBase &PM) {
return new X86PassConfig(this, PM);		return new X86PassConfig(this, PM);
}		}
Show All 29 Lines	bool X86PassConfig::addPreISel() {
const Triple &TT = TM->getTargetTriple();		const Triple &TT = TM->getTargetTriple();
if (TT.isOSWindows() && TT.getArch() == Triple::x86)		if (TT.isOSWindows() && TT.getArch() == Triple::x86)
addPass(createX86WinEHStatePass());		addPass(createX86WinEHStatePass());
return true;		return true;
}		}

void X86PassConfig::addPreRegAlloc() {		void X86PassConfig::addPreRegAlloc() {
if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
		if (EnableSetCCFixup) {
addPass(createX86FixupSetCC());		addPass(createX86FixupSetCC());
		}
addPass(createX86OptimizeLEAs());		addPass(createX86OptimizeLEAs());
addPass(createX86CallFrameOptimization());		addPass(createX86CallFrameOptimization());
}		}

addPass(createX86WinAllocaExpander());		addPass(createX86WinAllocaExpander());
}		}

void X86PassConfig::addPostRegAlloc() {		void X86PassConfig::addPostRegAlloc() {
addPass(createX86FloatingPointStackifierPass());		addPass(createX86FloatingPointStackifierPass());
}		}

		bool X86PassConfig::addPreRewrite() {
		if (!EnableSetCCFixup) {
		addPass(createX86FixupZExt());
		}
		return false;
		}

void X86PassConfig::addPreSched2() { addPass(createX86ExpandPseudoPass()); }		void X86PassConfig::addPreSched2() { addPass(createX86ExpandPseudoPass()); }

void X86PassConfig::addPreEmitPass() {		void X86PassConfig::addPreEmitPass() {
if (getOptLevel() != CodeGenOpt::None)		if (getOptLevel() != CodeGenOpt::None)
addPass(createExecutionDependencyFixPass(&X86::VR128XRegClass));		addPass(createExecutionDependencyFixPass(&X86::VR128XRegClass));

if (UseVZeroUpper)		if (UseVZeroUpper)
addPass(createX86IssueVZeroUpperPass());		addPass(createX86IssueVZeroUpperPass());

if (getOptLevel() != CodeGenOpt::None) {		if (getOptLevel() != CodeGenOpt::None) {
addPass(createX86FixupBWInsts());		addPass(createX86FixupBWInsts());
addPass(createX86PadShortFunctions());		addPass(createX86PadShortFunctions());
addPass(createX86FixupLEAs());		addPass(createX86FixupLEAs());
}		}
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Generalized transformation of `definstr gr8; movzx gr32, gr8` to `xor gr32, gr32; definstr gr8`Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 68285

include/llvm/CodeGen/CalcSpillWeights.h

lib/CodeGen/CalcSpillWeights.cpp

lib/Target/X86/CMakeLists.txt

lib/Target/X86/X86.h

lib/Target/X86/X86FixupZExt.cpp

lib/Target/X86/X86TargetMachine.cpp

[X86] Generalized transformation of `definstr gr8; movzx gr32, gr8` to `xor gr32, gr32; definstr gr8`
Needs ReviewPublic