This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
1/1
TargetInstrInfo.h
-
lib/
-
CodeGen/
-
TwoAddressInstructionPass.cpp
-
Target/X86/
-
X86/
6/7
X86FixupLEAs.cpp
-
X86InstrInfo.h
2/3
X86InstrInfo.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
2009-03-23-MultiUseSched.ll
2/4
lea-opt2.ll
-
vp2intersect_multiple_pairs.ll

Differential D101970

[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB
ClosedPublic

Authored by Carrot on May 5 2021, 7:43 PM.

Download Raw Diff

Details

Reviewers

craig.topper
spatel
RKSimon
lebedev.ri
nikic

Commits

rG1b748faf2bae: [X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB
rG528bc10e95d5: [X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB

Summary

This patch transforms the sequence

    lea (reg1, reg2), reg3
    sub reg3, reg4

to two sub instructions

    sub reg1, reg4
    sub reg2, reg4

Similar optimization can also be applied to LEA/ADD sequence.
The modifications to TwoAddressInstructionPass is to ensure the operands of ADD instruction has expected order (the dest register of LEA should be src register of ADD).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Carrot created this revision.May 5 2021, 7:43 PM

Herald added subscribers: mstorsjo, pengfei, hiraditya. · View Herald TranscriptMay 5 2021, 7:43 PM

Carrot requested review of this revision.May 5 2021, 7:43 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2021, 7:43 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B102903: Diff 343272.May 5 2021, 8:18 PM

RKSimon added a reviewer: RKSimon.May 6 2021, 1:53 AM

RKSimon added inline comments.May 6 2021, 6:11 AM

llvm/include/llvm/CodeGen/TargetInstrInfo.h
462	Add doxygen description
llvm/lib/Target/X86/X86FixupLEAs.cpp
454	for (unsigned I = 0, E = CurInst->getNumOperands(); I != E; ++I)
llvm/lib/Target/X86/X86InstrInfo.cpp
2708	Reduce scope: if (MachineOperand *Op = MRI.getOneDef(Reg1))
2718	Reduce scope: if (MachineOperand *Op = MRI.getOneDef(Reg2))
llvm/test/CodeGen/X86/lea-opt2.ll
2	I think its ok to pre-commit this (with a suitable extra FIXME/TODO explanation comment) and then rebase the patch to show the diffs.
145	are the nsw relevant?

Carrot added inline comments.May 6 2021, 9:59 AM

llvm/test/CodeGen/X86/lea-opt2.ll
2	Will do.
145	It should not be relevant.

I think it is good when the patch's description not just what the patch does, but also why it does that.

Carrot mentioned this in D102010: Pre-commit test case for D101970.May 6 2021, 10:58 AM

In D101970#2742388, @lebedev.ri wrote:

I think it is good when the patch's description not just what the patch does, but also why it does that.

Same as other LEA -> ALU optimizations in optTwoAddrLEA. On old architectures such as HSW or SKL, there are less issue ports containing LEA function unit than ALU. So LEA instructions may be delayed due to issue port competition. On newer architectures such as ICL or TGL, all four ports can issue LEA instructions, there is no such problem, but it doesn't hurt, because all optimizations in optTwoAddrLEA don't generate extra instructions.

Carrot mentioned this in rGa0fed635fe17: Pre-commit test case for D101970.May 10 2021, 2:52 PM

Carrot updated this revision to Diff 344288.May 10 2021, 11:13 PM

Carrot marked 3 inline comments as done.

Harbormaster completed remote builds in B103661: Diff 344288.May 10 2021, 11:54 PM

RKSimon added a reviewer: lebedev.ri.May 11 2021, 6:07 AM

Can you try this patch with http://llvm-compile-time-tracker.com/ ?

In D101970#2750517, @xbolva00 wrote:

Can you try this patch with http://llvm-compile-time-tracker.com/ ?

I studied the page for a while, but couldn't find the function to test the compile time of a patch, I can only find the comparison between two commits.

https://llvm-compile-time-tracker.com/about.php

Assuming you have a public llvm fork on your github, @nikic can set you up - you'd need to create a branch with 'perf/' and it would get picked up automatically.

@xbolva00 what in particular are you concerned about? Avoiding LEA can be useful but I doubt there will be much effect here.

Whether we want to run this transformation on newer archs - as you said, it brings us nothing, so atleast we should check it is free in terms of compile time in the backend.

Compile-time: https://llvm-compile-time-tracker.com/compare.php?from=15565403722ec37d8b1a3ee8625ee2e8efcd96ee&to=ebe267e3e1133d04a80a40357ebe93a2fe6b018b&stat=instructions Looks neutral, the NewPM-O3 SPASS regressions is most likely just noise.

RKSimon added inline comments.May 12 2021, 2:36 AM

llvm/lib/Target/X86/X86FixupLEAs.cpp
420	Please can you fix all these case style warnings: MachineOperand &Opnd = CurInst->getOperand(I);

Carrot updated this revision to Diff 344925.May 12 2021, 1:06 PM

Carrot marked an inline comment as done.

Harbormaster completed remote builds in B104120: Diff 344925.May 12 2021, 2:17 PM

RKSimon added inline comments.May 14 2021, 7:18 AM

llvm/lib/Target/X86/X86FixupLEAs.cpp
403	const int InstrDistanceThreshold = 5;

Carrot updated this revision to Diff 345584.May 14 2021, 4:21 PM

Carrot marked an inline comment as done.

Harbormaster completed remote builds in B104611: Diff 345584.May 14 2021, 5:39 PM

LGTM cheers

This revision is now accepted and ready to land.May 15 2021, 4:16 AM

Closed by commit rG528bc10e95d5: [X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB (authored by Carrot). · Explain WhyMay 18 2021, 6:05 PM

This revision was automatically updated to reflect the committed changes.

Carrot added a commit: rG528bc10e95d5: [X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB.

Just a heads up, my auto-bisecting multi-stage cron job has identified this change as the source of a second stage regression in a bunch of clang unit tests. I'm about to verify by hand, but it'll take a while.

In D101970#2768229, @davezarzycki wrote:

Just a heads up, my auto-bisecting multi-stage cron job has identified this change as the source of a second stage regression in a bunch of clang unit tests. I'm about to verify by hand, but it'll take a while.

perf or correctness regression?

Lots of crashes like this:

FAIL: Clang-Unit :: Format/./FormatTests/FormatTest.FormatsCompactNamespaces (23265 of 76779)
******************** TEST 'Clang-Unit :: Format/./FormatTests/FormatTest.FormatsCompactNamespaces' FAILED ********************
Note: Google Test filter = FormatTest.FormatsCompactNamespaces
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from FormatTest
[ RUN      ] FormatTest.FormatsCompactNamespaces
FormatTests: /home/dave/ro_s/lp/clang/lib/Basic/SourceManager.cpp:865: clang::FileID clang::SourceManager::getFileIDLoaded(unsigned int) const: Assertion `0 && "Invalid SLocOffset or bad function choice"' failed.
 #0 0x0000000000604313 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/tmp/_update_lc/t/tools/clang/unittests/Format/./FormatTests+0x604313)
 #1 0x0000000000602962 llvm::sys::RunSignalHandlers() (/tmp/_update_lc/t/tools/clang/unittests/Format/./FormatTests+0x602962)
 #2 0x0000000000604d8a SignalHandler(int) Signals.cpp:0:0
 #3 0x00007ffff7fa4a20 (/lib64/libpthread.so.0+0x13a20)
 #4 0x00007ffff7b392a2 raise (/lib64/libc.so.6+0x3d2a2)
 #5 0x00007ffff7b228a4 abort (/lib64/libc.so.6+0x268a4)
 #6 0x00007ffff7b22789 (/lib64/libc.so.6+0x26789)
 #7 0x00007ffff7b31a16 (/lib64/libc.so.6+0x35a16)
 #8 0x000000000064922c clang::SourceManager::getFileIDLoaded(unsigned int) const (/tmp/_update_lc/t/tools/clang/unittests/Format/./FormatTests+0x64922c)
 #9 0x00000000003e734c clang::SourceManager::getDecomposedLoc(clang::SourceLocation) const CleanupTest.cpp:0:0
#10 0x000000000064b231 clang::SourceManager::isBeforeInTranslationUnit(clang::SourceLocation, clang::SourceLocation) const (/tmp/_update_lc/t/tools/clang/unittests/Format/./FormatTests+0x64b231)
#11 0x000000000066c05e clang::format::AffectedRangeManager::computeAffectedLines(llvm::SmallVectorImpl<clang::format::AnnotatedLine*>&) (/tmp/_update_lc/t/tools/clang/unittests/Format/./FormatTests+0x66c05e)
#12 0x00000000006ae393 clang::format::UsingDeclarationsSorter::analyze(clang::format::TokenAnnotator&, llvm::SmallVectorImpl<clang::format::AnnotatedLine*>&, clang::format::FormatTokenLexer&) (/tmp/_update_lc/t/tools/clang/unittests/Format/./FormatTests+0x6ae393)
#13 0x0000000000690d1a clang::format::TokenAnalyzer::process() (/tmp/_update_lc/t/tools/clang/unittests/Format/./FormatTests+0x690d1a)
#14 0x0000000000665643 std::__1::__function::__func<clang::format::internal::reformat(clang::format::FormatStyle const&, llvm::StringRef, llvm::ArrayRef<clang::tooling::Range>, unsigned int, unsigned int, unsigned int, llvm::StringRef, clang::format::FormattingAttemptStatus*)::$_4, std::__1::allocator<clang::format::internal::reformat(clang::format::FormatStyle const&, llvm::StringRef, llvm::ArrayRef<clang::tooling::Range>, unsigned int, unsigned int, unsigned int, llvm::StringRef, clang::format::FormattingAttemptStatus*)::$_4>, std::__1::pair<clang::tooling::Replacements, unsigned int> (clang::format::Environment const&)>::operator()(clang::format::Environment const&) Format.cpp:0:0
#15 0x0000000000655d6e clang::format::internal::reformat(clang::format::FormatStyle const&, llvm::StringRef, llvm::ArrayRef<clang::tooling::Range>, unsigned int, unsigned int, unsigned int, llvm::StringRef, clang::format::FormattingAttemptStatus*) (/tmp/_update_lc/t/tools/clang/unittests/Format/./FormatTests+0x655d6e)
#16 0x000000000065640f clang::format::reformat(clang::format::FormatStyle const&, llvm::StringRef, llvm::ArrayRef<clang::tooling::Range>, llvm::StringRef, clang::format::FormattingAttemptStatus*) (/tmp/_update_lc/t/tools/clang/unittests/Format/./FormatTests+0x65640f)
#17 0x00000000003ef27f clang::format::(anonymous namespace)::FormatTest::format(llvm::StringRef, clang::format::FormatStyle const&, clang::format::(anonymous namespace)::FormatTest::StatusCheck) FormatTest.cpp:0:0
#18 0x0000000000407eed clang::format::(anonymous namespace)::FormatTest_FormatsCompactNamespaces_Test::TestBody() FormatTest.cpp:0:0
#19 0x000000000060cc53 testing::Test::Run() (/tmp/_update_lc/t/tools/clang/unittests/Format/./FormatTests+0x60cc53)
#20 0x000000000060dc16 testing::TestInfo::Run() (/tmp/_update_lc/t/tools/clang/unittests/Format/./FormatTests+0x60dc16)
#21 0x000000000060e470 testing::TestSuite::Run() (/tmp/_update_lc/t/tools/clang/unittests/Format/./FormatTests+0x60e470)
#22 0x000000000061bb23 testing::internal::UnitTestImpl::RunAllTests() (/tmp/_update_lc/t/tools/clang/unittests/Format/./FormatTests+0x61bb23)
#23 0x000000000061b59d testing::UnitTest::Run() (/tmp/_update_lc/t/tools/clang/unittests/Format/./FormatTests+0x61b59d)
#24 0x000000000060555b main (/tmp/_update_lc/t/tools/clang/unittests/Format/./FormatTests+0x60555b)
#25 0x00007ffff7b23b75 __libc_start_main (/lib64/libc.so.6+0x27b75)
#26 0x00000000003ddd8e _start (/tmp/_update_lc/t/tools/clang/unittests/Format/./FormatTests+0x3ddd8e)

(IOW the stage 2 is getting miscompiled)

Verified. Second stage miscompile. If it matters (and it sometimes does), my first stage is built without asserts.

RKSimon added a reverting change: rG707fc2e2f227: Revert rG528bc10e95d5f9d6a338f9bab5e91d7265d1cf05 : "[X86FixupLEAs] Transform….May 19 2021, 7:01 AM

Reverted in rG707fc2e2f227ec7b367273d0906b953bbae41392

This revision is now accepted and ready to land.May 19 2021, 7:07 AM

RKSimon requested changes to this revision.May 19 2021, 7:08 AM

This revision now requires changes to proceed.May 19 2021, 7:08 AM

Thanks! Also, I feel fairly confident about this. My auto-bisecting cron job is really paranoid. For example, it always does a multi-stage test to verify that the commit *before* the commit blamed by git-bisect actually works. That's what I was waiting for. Also, I then verified by trying to move one commit ahead in the history to this commit to get an A/B test and I then confirmed that this is the regression. Thanks again for reverting this.

Thanks for the report!

The problem is in

-       lea    (%rax,%rcx,1),%ebp
        mov    %esi,%edi
-       sub    %ebp,%edi
-       jbe    <_ZNK5clang7tooling12Replacements5mergeERKS1_+0x279>
+       sub    %eax,%edi
+       sub    %ecx,%edi
+       jbe    <_ZNK5clang7tooling12Replacements5mergeERKS1_+0x278>

Previously I thought X - (Y + Z) generates same flags as X - Y - Z, unfortunately it is not true when X - Y overflows.
In one execution of the code snippet, I got
rsi 0x3 3
rax 0x5 5
rcx 0x0 0
Before the transformation the last sub generates CF=1.
After the transformation the last sub generates CF=0.
So it causes wrong behavior for the following branch.

This patch should fix the problem.

@RKSimon, could you please take another look.

@davezarzycki, could you help to test this patch with your configuration?

thanks!

Harbormaster completed remote builds in B105977: Diff 347480.May 24 2021, 2:03 PM

I can check it, but I'm a bit concerned about the comment and what it implies. The problem is NOT overflow, but that comparison operations need to be redistributed during this optimization (or dependent comparisons need to disable the optimization). So in pseudocode, the original code that miscompiled looks like this:

if (x - (y + z) <= 0) goto something; // where <= is a unsigned comparison but it really doesn't matter.

So redistributing the arithmetic requires redistributing the comparison.

auto t1 = x - y
if (t1 <= 0) goto something;
if (t1 - z <= 0) goto something;

Does the new patch redistribute comparisons?

Actually, it is more complicated, and you're right that it also can involve overflow. I'm not sure if this is a trivial optimization when dependent comparisons are involved.

In D101970#2779338, @davezarzycki wrote:

Actually, it is more complicated, and you're right that it also can involve overflow. I'm not sure if this is a trivial optimization when dependent comparisons are involved.

The new changes in function searchALUInst exactly does this check.

// X - (Y + Z) may generate different flags than (X - Y) - Z when there
// is overflow. So we can't change the alu instruction if the flags
// register is live.
MachineBasicBlock::iterator NextI = std::next(CurInst);
if (MBB.computeRegisterLiveness(TRI, X86::EFLAGS, NextI) !=
    MachineBasicBlock::LQR_Dead)
  return MachineBasicBlock::iterator();

If the flags of SUB/ADD is live, it will be used by following comparison or conditional branch or other instructions, we will give up the transformation.

In other words, only when the flags of the ALU instruction will not be used by following instructions (comparison or conditional branch), then we can do the transformation.

I've verified that this patch no longer causes my multi-stage cron job to regress. Thanks!

craig.topper added inline comments.May 26 2021, 10:18 AM

llvm/lib/Target/X86/X86FixupLEAs.cpp
437	Why can't we just check the Dead flag on the EFLAGS def of the ALU op?

Check the dead EFLAGS def directly instead of calling computeRegisterLiveness.
Add implicit dead def EFLAGS to new alu instructions.

llvm/lib/Target/X86/X86FixupLEAs.cpp
437	Good catch! I copied the code from optTwoAddrLEA.

Harbormaster completed remote builds in B106377: Diff 348085.May 26 2021, 2:46 PM

craig.topper added inline comments.May 26 2021, 5:04 PM

llvm/lib/Target/X86/X86FixupLEAs.cpp
437	Can we use MachineInstr::registerDefIsDead
515	This creates a second EFLAGS implicit def. The BuildMI would have already created one based on the MCInstrDesc. I checked the print-after-all output and saw this on the lea-opt2.ll test1. $ecx = SUB32rr $ecx(tied-def 0), $edx, implicit-def $eflags, implicit-def dead $eflags $ecx = SUB32rr $ecx(tied-def 0), $eax, implicit-def $eflags, implicit-def dead $eflags I added -verify-machineinstrs, but it doesn't detect this as an issue. That probably doesn't work well with findRegisterDefOperandIdx. I think what you want to do is this creating the instruction. NewMI1->addRegisterDead(X86::EFLAGS, TRI); That will scan the operands and mark the existing EFLAGS operand as dead.

Carrot updated this revision to Diff 348161.May 26 2021, 10:06 PM

Carrot marked 2 inline comments as done.

Harbormaster completed remote builds in B106433: Diff 348161.May 26 2021, 10:36 PM

craig.topper added inline comments.May 27 2021, 11:04 AM

llvm/lib/Target/X86/X86InstrInfo.cpp
2707	I think you might be able to use getUniqueVRegDef which will return the MachineInstr directly

Carrot updated this revision to Diff 348354.May 27 2021, 12:24 PM

Carrot marked an inline comment as done.

LGTM

Harbormaster completed remote builds in B106578: Diff 348354.May 27 2021, 1:02 PM

LGTM to unblock

This revision is now accepted and ready to land.May 28 2021, 3:16 AM

Closed by commit rG1b748faf2bae: [X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB (authored by Carrot). · Explain WhyJun 1 2021, 10:33 AM

This revision was automatically updated to reflect the committed changes.

Carrot added a commit: rG1b748faf2bae: [X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB.

We are still observing a miscompile caused by the newer version of this patch. I'll try to provide you a repro.

Filed https://bugs.llvm.org/show_bug.cgi?id=50615. The original repro is in Java, but I hope the logs there are enough to figure out what's going on. The log shows that the described transform did apply.

I suggest to revert and then investigate.

In D101970#2804705, @mkazantsev wrote:

We are still observing a miscompile caused by the newer version of this patch. I'll try to provide you a repro.

Surely you can provide the .ll after middle-end optimizations?

In D101970#2805039, @lebedev.ri wrote:

In D101970#2804705, @mkazantsev wrote:

We are still observing a miscompile caused by the newer version of this patch. I'll try to provide you a repro.

Surely you can provide the .ll after middle-end optimizations?

Attached repro runnable with upstream llc to bug. Seems that it reproduces the same effect.

Thanks for the report!
I guess function searchALUInst should also check overlapped register usage of DestReg.

I'm a bit surprised that it wasn't reverted. The default policy is revert whatever is broken and then fix it. We have numerous internal failures & broken CI cycles because of this, and the fix is not merged yet (and not clear when it will be merged). @Carrot could you please revert this and return back along with fix?

I also suspect this patch is causing the following failure (with -mllvm -verify-machineinstrs ) on GreenDragon (http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64-O3/9579/)

/Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-x86_64-O3/test-suite-build/tools/timeit --summary MultiSource/Applications/JM/ldecod/CMakeFiles/ldecod.dir/block.c.o.time /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-x86_64-O3/compiler/bin/clang -DNDEBUG  -B /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin    -Wno-unused-command-line-argument -mllvm -verify-machineinstrs -O3 -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.15.sdk   -w -Werror=date-time -fcommon -D__USE_LARGEFILE64 -D_FILE_OFFSET_BITS=64 -MD -MT MultiSource/Applications/JM/ldecod/CMakeFiles/ldecod.dir/block.c.o -MF MultiSource/Applications/JM/ldecod/CMakeFiles/ldecod.dir/block.c.o.d -o MultiSource/Applications/JM/ldecod/CMakeFiles/ldecod.dir/block.c.o   -c /Users/buildslave/jenkins/workspace/test-suite-verify-machineinstrs-x86_64-O3/test-suite/MultiSource/Applications/JM/ldecod/block.c
Fatal Error: error in backend: Found 2 machine code errors.
*** Bad machine code: Using an undefined physical register ***
- function:    itrans_sp
- basic block: %bb.0 entry (0x7fe93e957ac8)
- instruction: $ecx = SUB32rr $ecx(tied-def 0), $esi, implicit-def dead $eflags
- operand 2:   $esi
fatal error: error in backend: Found 2 machine code errors.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
Error: clang frontend command failed with exit code 70 (use -v to see invocation)
25 clang         0x000000011115a63d clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const + 221
26 clang         0x000000011115ab9d clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) const + 125
27 clang         0x00000001111715ec clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::__1::pair<int, clang::driver::Command const*> >&) + 204
28 clang         0x000000010e89a145 main + 10309
29 libdyld.dylib 0x00007fff67a6fcc9 start + 1
clang-13: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 13.0.0 (https://github.com/llvm/llvm-project.git f0a68bbc967ab851e9b678feaf9015a2bfadb12e)
Target: x86_64-apple-darwin19.5.0

It would be great if you could take a look and revert the patch if the investigation will take longer.

The fix has been committed as f35bcea1d4748889b8240defdf00cb7a71cbe070.

fhahn added a reverting change: rG5cd66420ccb1: Revert "[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB".Jun 12 2021, 3:44 AM

In D101970#2813643, @Carrot wrote:

The fix has been committed as f35bcea1d4748889b8240defdf00cb7a71cbe070.

Unfortunately this still creates invalid Machine IR for the llvm-test-suite on X86, as I mentioned above. E.g. see http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64-O3/9585/

I reverted the fix and the patch in 5cd66420ccb1, 1b748faf2bae to get the public bots back to green.

To reproduce, llc -verify-machineinstrs on the IR below on X86:

target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

%struct.widget = type { i32, i32, i32, i32, i32*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [16 x [16 x i16]], [6 x [32 x i32]], [16 x [16 x i32]], [4 x [12 x [4 x [4 x i32]]]], [16 x i32], i8**, i32*, i32***, i32**, i32, i32, i32, i32, %struct.baz*, %struct.wobble.1*, i32, i32, i32, i32, i32, i32, %struct.quux.2*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x i32], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32***, i32***, i32****, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x [2 x i32]], [3 x [2 x i32]], i32, i32, i64, i64, %struct.zot.3, %struct.zot.3, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }
%struct.baz = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, %struct.snork*, %struct.wombat.0*, %struct.wobble*, i32, i32*, i32*, i32*, i32, i32*, i32*, i32*, i32 (%struct.widget*, %struct.eggs*)*, i32, i32, i32, i32 }
%struct.snork = type { %struct.spam*, %struct.zot, i32 (%struct.wombat*, %struct.widget*, %struct.snork*)* }
%struct.spam = type { i32, i32, i32, i32, i8*, i32 }
%struct.zot = type { i32, i32, i32, i32, i32, i8*, i32* }
%struct.wombat = type { i32, i32, i32, i32, i32, i32, i32, i32, void (i32, i32, i32*, i32*)*, void (%struct.wombat*, %struct.widget*, %struct.zot*)* }
%struct.wombat.0 = type { [4 x [11 x %struct.quux]], [2 x [9 x %struct.quux]], [2 x [10 x %struct.quux]], [2 x [6 x %struct.quux]], [4 x %struct.quux], [4 x %struct.quux], [3 x %struct.quux] }
%struct.quux = type { i16, i8 }
%struct.wobble = type { [2 x %struct.quux], [4 x %struct.quux], [3 x [4 x %struct.quux]], [10 x [4 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]] }
%struct.eggs = type { [1000 x i8], [1000 x i8], [1000 x i8], i32, i32, i32, i32, i32, i32, i32, i32 }
%struct.wobble.1 = type { i32, [2 x i32], i32, i32, %struct.wobble.1*, %struct.wobble.1*, i32, [2 x [4 x [4 x [2 x i32]]]], i32, i64, i64, i32, i32, [4 x i8], [4 x i8], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }
%struct.quux.2 = type { i32, i32, i32, i32, i32, %struct.quux.2* }
%struct.zot.3 = type { i64, i16, i16, i16 }

define void @blam(%struct.widget* %arg, i32 %arg1) local_unnamed_addr {
bb:
  %tmp = load i32, i32* undef, align 4
  %tmp2 = sdiv i32 %tmp, 6
  %tmp3 = sdiv i32 undef, 6
  %tmp4 = load i32, i32* undef, align 4
  %tmp5 = icmp eq i32 %tmp4, 4
  %tmp6 = select i1 %tmp5, i32 %tmp3, i32 %tmp2
  %tmp7 = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]]* undef, i64 0, i64 0, i64 0
  %tmp8 = zext i16 undef to i32
  %tmp9 = zext i16 undef to i32
  %tmp10 = load i16, i16* undef, align 2
  %tmp11 = zext i16 %tmp10 to i32
  %tmp12 = zext i16 undef to i32
  %tmp13 = zext i16 undef to i32
  %tmp14 = zext i16 undef to i32
  %tmp15 = load i16, i16* undef, align 2
  %tmp16 = zext i16 %tmp15 to i32
  %tmp17 = zext i16 undef to i32
  %tmp18 = sub nsw i32 %tmp8, %tmp9
  %tmp19 = shl nsw i32 undef, 1
  %tmp20 = add nsw i32 %tmp19, %tmp18
  %tmp21 = sub nsw i32 %tmp11, %tmp12
  %tmp22 = shl nsw i32 undef, 1
  %tmp23 = add nsw i32 %tmp22, %tmp21
  %tmp24 = sub nsw i32 %tmp13, %tmp14
  %tmp25 = shl nsw i32 undef, 1
  %tmp26 = add nsw i32 %tmp25, %tmp24
  %tmp27 = sub nsw i32 %tmp16, %tmp17
  %tmp28 = shl nsw i32 undef, 1
  %tmp29 = add nsw i32 %tmp28, %tmp27
  %tmp30 = sub nsw i32 %tmp20, %tmp29
  %tmp31 = sub nsw i32 %tmp23, %tmp26
  %tmp32 = shl nsw i32 %tmp30, 1
  %tmp33 = add nsw i32 %tmp32, %tmp31
  store i32 %tmp33, i32* undef, align 4
  %tmp34 = mul nsw i32 %tmp31, -2
  %tmp35 = add nsw i32 %tmp34, %tmp30
  store i32 %tmp35, i32* undef, align 4
  %tmp36 = select i1 %tmp5, i32 undef, i32 undef
  br label %bb37

bb37:                                             ; preds = %bb
  %tmp38 = load i32, i32* undef, align 4
  %tmp39 = ashr i32 %tmp38, %tmp6
  %tmp40 = load i32, i32* undef, align 4
  %tmp41 = sdiv i32 %tmp39, %tmp40
  store i32 %tmp41, i32* undef, align 4
  ret void
}

Thanks for the test case!
The problem is:

-- Before transformation
  renamable $esi = LEA64_32r renamable $rcx, 1, renamable $rcx, 0, $noreg
  renamable $edi = LEA64_32r renamable $rcx, 2, renamable $rcx, 0, $noreg
  $eax = MOV32rr $ecx, implicit killed $rcx                                                               // $rcx is killed at here 
  renamable $eax = SUB32rr killed renamable $eax(tied-def 0), killed renamable $esi, implicit-def dead $eflags

-- After transformation
  renamable $edi = LEA64_32r renamable $rcx, 2, renamable $rcx, 0, $noreg
  $eax = MOV32rr $ecx, implicit killed $rcx                                                               // $rcx should not be killed at here
  $eax = SUB32rr $eax(tied-def 0), $ecx, implicit-def dead $eflags
  $eax = SUB32rr $eax(tied-def 0), $ecx, implicit-def dead $eflags                         // $rcx should be killed at here

The transformation may extend the life range of original BaseReg and IndexReg like $rcx in this test case, so the original kill flag should be cleared, and new kill flag should be added in new instruction.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetInstrInfo.h

7 lines

lib/

CodeGen/

TwoAddressInstructionPass.cpp

5 lines

Target/

X86/

X86FixupLEAs.cpp

171 lines

X86InstrInfo.h

4 lines

X86InstrInfo.cpp

52 lines

test/

CodeGen/

X86/

2009-03-23-MultiUseSched.ll

94 lines

lea-opt2.ll

74 lines

vp2intersect_multiple_pairs.ll

14 lines

Diff 349017

llvm/include/llvm/CodeGen/TargetInstrInfo.h

Show First 20 Lines • Show All 453 Lines • ▼ Show 20 Lines	public:
/// unsigned Op1 = 1, Op2 = CommuteAnyOperandIndex;		/// unsigned Op1 = 1, Op2 = CommuteAnyOperandIndex;
/// findCommutedOpIndices(MI, Op1, Op2);		/// findCommutedOpIndices(MI, Op1, Op2);
/// can be interpreted as a query asking to find an operand that would be		/// can be interpreted as a query asking to find an operand that would be
/// commutable with the operand#1.		/// commutable with the operand#1.
virtual bool findCommutedOpIndices(const MachineInstr &MI,		virtual bool findCommutedOpIndices(const MachineInstr &MI,
unsigned &SrcOpIdx1,		unsigned &SrcOpIdx1,
unsigned &SrcOpIdx2) const;		unsigned &SrcOpIdx2) const;

		/// Returns true if the target has a preference on the operands order of
		RKSimonUnsubmitted Done Reply Inline Actions Add doxygen description RKSimon: Add doxygen description
		/// the given machine instruction. And specify if \p Commute is required to
		/// get the desired operands order.
		virtual bool hasCommutePreference(MachineInstr &MI, bool &Commute) const {
		return false;
		}

/// A pair composed of a register and a sub-register index.		/// A pair composed of a register and a sub-register index.
/// Used to give some type checking when modeling Reg:SubReg.		/// Used to give some type checking when modeling Reg:SubReg.
struct RegSubRegPair {		struct RegSubRegPair {
Register Reg;		Register Reg;
unsigned SubReg;		unsigned SubReg;

RegSubRegPair(Register Reg = Register(), unsigned SubReg = 0)		RegSubRegPair(Register Reg = Register(), unsigned SubReg = 0)
: Reg(Reg), SubReg(SubReg) {}		: Reg(Reg), SubReg(SubReg) {}
▲ Show 20 Lines • Show All 1,551 Lines • Show Last 20 Lines

llvm/lib/CodeGen/TwoAddressInstructionPass.cpp

Show First 20 Lines • Show All 521 Lines • ▼ Show 20 Lines	bool TwoAddressInstructionPass::isProfitableToCommute(Register RegA,
// instruction pass should be integrated with register allocation pass where		// instruction pass should be integrated with register allocation pass where
// interference graph is available.		// interference graph is available.
if (isRevCopyChain(RegC, RegA, MaxDataFlowEdge))		if (isRevCopyChain(RegC, RegA, MaxDataFlowEdge))
return true;		return true;

if (isRevCopyChain(RegB, RegA, MaxDataFlowEdge))		if (isRevCopyChain(RegB, RegA, MaxDataFlowEdge))
return false;		return false;

		// Look for other target specific commute preference.
		bool Commute;
		if (TII->hasCommutePreference(*MI, Commute))
		return Commute;

// Since there are no intervening uses for both registers, then commute		// Since there are no intervening uses for both registers, then commute
// if the def of RegC is closer. Its live interval is shorter.		// if the def of RegC is closer. Its live interval is shorter.
return LastDefB && LastDefC && LastDefC > LastDefB;		return LastDefB && LastDefC && LastDefC > LastDefB;
}		}

/// Commute a two-address instruction and update the basic block, distance map,		/// Commute a two-address instruction and update the basic block, distance map,
/// and live variables if needed. Return true if it is successful.		/// and live variables if needed. Return true if it is successful.
bool TwoAddressInstructionPass::commuteInstruction(MachineInstr *MI,		bool TwoAddressInstructionPass::commuteInstruction(MachineInstr *MI,
▲ Show 20 Lines • Show All 1,190 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86FixupLEAs.cpp

Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	void processInstrForSlow3OpLEA(MachineBasicBlock::iterator &I,
MachineBasicBlock &MBB, bool OptIncDec);		MachineBasicBlock &MBB, bool OptIncDec);

/// Look for LEAs that are really two address LEAs that we might be able to		/// Look for LEAs that are really two address LEAs that we might be able to
/// turn into regular ADD instructions.		/// turn into regular ADD instructions.
bool optTwoAddrLEA(MachineBasicBlock::iterator &I,		bool optTwoAddrLEA(MachineBasicBlock::iterator &I,
MachineBasicBlock &MBB, bool OptIncDec,		MachineBasicBlock &MBB, bool OptIncDec,
bool UseLEAForSP) const;		bool UseLEAForSP) const;

		/// Look for and transform the sequence
		/// lea (reg1, reg2), reg3
		/// sub reg3, reg4
		/// to
		/// sub reg1, reg4
		/// sub reg2, reg4
		/// It can also optimize the sequence lea/add similarly.
		bool optLEAALU(MachineBasicBlock::iterator &I, MachineBasicBlock &MBB) const;

		/// Step forwards in MBB, looking for an ADD/SUB instruction which uses
		/// the dest register of LEA instruction I.
		MachineBasicBlock::iterator searchALUInst(MachineBasicBlock::iterator &I,
		MachineBasicBlock &MBB) const;

		/// Check instructions between LeaI and AluI (exclusively).
		/// Set BaseIndexDef to true if base or index register from LeaI is defined.
		/// Set AluDestRef to true if the dest register of AluI is used or defined.
		void checkRegUsage(MachineBasicBlock::iterator &LeaI,
		MachineBasicBlock::iterator &AluI, bool &BaseIndexDef,
		bool &AluDestRef) const;

/// Determine if an instruction references a machine register		/// Determine if an instruction references a machine register
/// and, if so, whether it reads or writes the register.		/// and, if so, whether it reads or writes the register.
RegUsageState usesRegister(MachineOperand &p, MachineBasicBlock::iterator I);		RegUsageState usesRegister(MachineOperand &p, MachineBasicBlock::iterator I);

/// Step backwards through a basic block, looking		/// Step backwards through a basic block, looking
/// for an instruction which writes a register within		/// for an instruction which writes a register within
/// a maximum of INSTR_DISTANCE_THRESHOLD instruction latency cycles.		/// a maximum of INSTR_DISTANCE_THRESHOLD instruction latency cycles.
MachineBasicBlock::iterator searchBackwards(MachineOperand &p,		MachineBasicBlock::iterator searchBackwards(MachineOperand &p,
▲ Show 20 Lines • Show All 243 Lines • ▼ Show 20 Lines	static inline unsigned getADDrrFromLEA(unsigned LEAOpcode) {
case X86::LEA32r:		case X86::LEA32r:
case X86::LEA64_32r:		case X86::LEA64_32r:
return X86::ADD32rr;		return X86::ADD32rr;
case X86::LEA64r:		case X86::LEA64r:
return X86::ADD64rr;		return X86::ADD64rr;
}		}
}		}

		static inline unsigned getSUBrrFromLEA(unsigned LEAOpcode) {
		switch (LEAOpcode) {
		default:
		llvm_unreachable("Unexpected LEA instruction");
		case X86::LEA32r:
		case X86::LEA64_32r:
		return X86::SUB32rr;
		case X86::LEA64r:
		return X86::SUB64rr;
		}
		}

static inline unsigned getADDriFromLEA(unsigned LEAOpcode,		static inline unsigned getADDriFromLEA(unsigned LEAOpcode,
const MachineOperand &Offset) {		const MachineOperand &Offset) {
bool IsInt8 = Offset.isImm() && isInt<8>(Offset.getImm());		bool IsInt8 = Offset.isImm() && isInt<8>(Offset.getImm());
switch (LEAOpcode) {		switch (LEAOpcode) {
default:		default:
llvm_unreachable("Unexpected LEA instruction");		llvm_unreachable("Unexpected LEA instruction");
case X86::LEA32r:		case X86::LEA32r:
case X86::LEA64_32r:		case X86::LEA64_32r:
Show All 10 Lines	static inline unsigned getINCDECFromLEA(unsigned LEAOpcode, bool IsINC) {
case X86::LEA32r:		case X86::LEA32r:
case X86::LEA64_32r:		case X86::LEA64_32r:
return IsINC ? X86::INC32r : X86::DEC32r;		return IsINC ? X86::INC32r : X86::DEC32r;
case X86::LEA64r:		case X86::LEA64r:
return IsINC ? X86::INC64r : X86::DEC64r;		return IsINC ? X86::INC64r : X86::DEC64r;
}		}
}		}

		MachineBasicBlock::iterator
		FixupLEAPass::searchALUInst(MachineBasicBlock::iterator &I,
		MachineBasicBlock &MBB) const {
		const int InstrDistanceThreshold = 5;
		RKSimonUnsubmitted Done Reply Inline Actions const int InstrDistanceThreshold = 5; RKSimon: ``` const int InstrDistanceThreshold = 5; ```
		int InstrDistance = 1;
		MachineBasicBlock::iterator CurInst = std::next(I);

		unsigned LEAOpcode = I->getOpcode();
		unsigned AddOpcode = getADDrrFromLEA(LEAOpcode);
		unsigned SubOpcode = getSUBrrFromLEA(LEAOpcode);
		Register DestReg = I->getOperand(0).getReg();

		while (CurInst != MBB.end()) {
		if (CurInst->isCall() \|\| CurInst->isInlineAsm())
		break;
		if (InstrDistance > InstrDistanceThreshold)
		break;

		// Check if the lea dest register is used in an add/sub instruction only.
		for (unsigned I = 0, E = CurInst->getNumOperands(); I != E; ++I) {
		MachineOperand &Opnd = CurInst->getOperand(I);
		RKSimonUnsubmitted Done Reply Inline Actions Please can you fix all these case style warnings: MachineOperand &Opnd = CurInst->getOperand(I); RKSimon: Please can you fix all these case style warnings: ``` MachineOperand &Opnd = CurInst…
		if (Opnd.isReg() && Opnd.getReg() == DestReg) {
		if (Opnd.isDef() \|\| !Opnd.isKill())
		return MachineBasicBlock::iterator();

		unsigned AluOpcode = CurInst->getOpcode();
		if (AluOpcode != AddOpcode && AluOpcode != SubOpcode)
		return MachineBasicBlock::iterator();

		MachineOperand &Opnd2 = CurInst->getOperand(3 - I);
		MachineOperand AluDest = CurInst->getOperand(0);
		if (Opnd2.getReg() != AluDest.getReg())
		return MachineBasicBlock::iterator();

		// X - (Y + Z) may generate different flags than (X - Y) - Z when there
		// is overflow. So we can't change the alu instruction if the flags
		// register is live.
		if (!CurInst->registerDefIsDead(X86::EFLAGS, TRI))
		craig.topperUnsubmitted Not Done Reply Inline Actions Why can't we just check the Dead flag on the EFLAGS def of the ALU op? craig.topper: Why can't we just check the Dead flag on the EFLAGS def of the ALU op?
		CarrotAuthorUnsubmitted Done Reply Inline Actions Good catch! I copied the code from optTwoAddrLEA. Carrot: Good catch! I copied the code from optTwoAddrLEA.
		craig.topperUnsubmitted Done Reply Inline Actions Can we use MachineInstr::registerDefIsDead craig.topper: Can we use MachineInstr::registerDefIsDead
		return MachineBasicBlock::iterator();

		return CurInst;
		}
		}

		InstrDistance++;
		++CurInst;
		}
		return MachineBasicBlock::iterator();
		}

		void FixupLEAPass::checkRegUsage(MachineBasicBlock::iterator &LeaI,
		MachineBasicBlock::iterator &AluI,
		bool &BaseIndexDef, bool &AluDestRef) const {
		BaseIndexDef = AluDestRef = false;
		Register BaseReg = LeaI->getOperand(1 + X86::AddrBaseReg).getReg();
		RKSimonUnsubmitted Done Reply Inline Actions for (unsigned I = 0, E = CurInst->getNumOperands(); I != E; ++I) RKSimon: ``` for (unsigned I = 0, E = CurInst->getNumOperands(); I != E; ++I) ```
		Register IndexReg = LeaI->getOperand(1 + X86::AddrIndexReg).getReg();
		Register AluDestReg = AluI->getOperand(0).getReg();

		MachineBasicBlock::iterator CurInst = std::next(LeaI);
		while (CurInst != AluI) {
		for (unsigned I = 0, E = CurInst->getNumOperands(); I != E; ++I) {
		MachineOperand &Opnd = CurInst->getOperand(I);
		if (!Opnd.isReg())
		continue;
		Register Reg = Opnd.getReg();
		if (TRI->regsOverlap(Reg, AluDestReg))
		AluDestRef = true;
		if (Opnd.isDef() &&
		(TRI->regsOverlap(Reg, BaseReg) \|\| TRI->regsOverlap(Reg, IndexReg))) {
		BaseIndexDef = true;
		}
		}
		++CurInst;
		}
		}

		bool FixupLEAPass::optLEAALU(MachineBasicBlock::iterator &I,
		MachineBasicBlock &MBB) const {
		// Look for an add/sub instruction which uses the result of lea.
		MachineBasicBlock::iterator AluI = searchALUInst(I, MBB);
		if (AluI == MachineBasicBlock::iterator())
		return false;

		// Check if there are any related register usage between lea and alu.
		bool BaseIndexDef, AluDestRef;
		checkRegUsage(I, AluI, BaseIndexDef, AluDestRef);

		MachineBasicBlock::iterator InsertPos = AluI;
		if (BaseIndexDef) {
		if (AluDestRef)
		return false;
		InsertPos = I;
		}

		// Check if there are same registers.
		Register AluDestReg = AluI->getOperand(0).getReg();
		Register BaseReg = I->getOperand(1 + X86::AddrBaseReg).getReg();
		Register IndexReg = I->getOperand(1 + X86::AddrIndexReg).getReg();
		if (I->getOpcode() == X86::LEA64_32r) {
		BaseReg = TRI->getSubReg(BaseReg, X86::sub_32bit);
		IndexReg = TRI->getSubReg(IndexReg, X86::sub_32bit);
		}
		if (AluDestReg == IndexReg) {
		if (BaseReg == IndexReg)
		return false;
		std::swap(BaseReg, IndexReg);
		}

		// Now it's safe to change instructions.
		MachineInstr NewMI1, NewMI2;
		unsigned NewOpcode = AluI->getOpcode();
		NewMI1 = BuildMI(MBB, InsertPos, AluI->getDebugLoc(), TII->get(NewOpcode),
		AluDestReg)
		.addReg(AluDestReg)
		.addReg(BaseReg);
		NewMI1->addRegisterDead(X86::EFLAGS, TRI);
		craig.topperUnsubmitted Done Reply Inline Actions This creates a second EFLAGS implicit def. The BuildMI would have already created one based on the MCInstrDesc. I checked the print-after-all output and saw this on the lea-opt2.ll test1. $ecx = SUB32rr $ecx(tied-def 0), $edx, implicit-def $eflags, implicit-def dead $eflags $ecx = SUB32rr $ecx(tied-def 0), $eax, implicit-def $eflags, implicit-def dead $eflags I added -verify-machineinstrs, but it doesn't detect this as an issue. That probably doesn't work well with findRegisterDefOperandIdx. I think what you want to do is this creating the instruction. NewMI1->addRegisterDead(X86::EFLAGS, TRI); That will scan the operands and mark the existing EFLAGS operand as dead. craig.topper: This creates a second EFLAGS implicit def. The BuildMI would have already created one based on…
		NewMI2 = BuildMI(MBB, InsertPos, AluI->getDebugLoc(), TII->get(NewOpcode),
		AluDestReg)
		.addReg(AluDestReg)
		.addReg(IndexReg);
		NewMI2->addRegisterDead(X86::EFLAGS, TRI);

		MBB.getParent()->substituteDebugValuesForInst(AluI, NewMI1, 1);
		MBB.getParent()->substituteDebugValuesForInst(AluI, NewMI2, 1);
		MBB.erase(I);
		MBB.erase(AluI);
		I = NewMI1;
		return true;
		}

bool FixupLEAPass::optTwoAddrLEA(MachineBasicBlock::iterator &I,		bool FixupLEAPass::optTwoAddrLEA(MachineBasicBlock::iterator &I,
MachineBasicBlock &MBB, bool OptIncDec,		MachineBasicBlock &MBB, bool OptIncDec,
bool UseLEAForSP) const {		bool UseLEAForSP) const {
MachineInstr &MI = *I;		MachineInstr &MI = *I;

const MachineOperand &Base = MI.getOperand(1 + X86::AddrBaseReg);		const MachineOperand &Base = MI.getOperand(1 + X86::AddrBaseReg);
const MachineOperand &Scale = MI.getOperand(1 + X86::AddrScaleAmt);		const MachineOperand &Scale = MI.getOperand(1 + X86::AddrScaleAmt);
const MachineOperand &Index = MI.getOperand(1 + X86::AddrIndexReg);		const MachineOperand &Index = MI.getOperand(1 + X86::AddrIndexReg);
Show All 18 Lines	if (MI.getOpcode() == X86::LEA64_32r) {
if (BaseReg != 0)		if (BaseReg != 0)
BaseReg = TRI->getSubReg(BaseReg, X86::sub_32bit);		BaseReg = TRI->getSubReg(BaseReg, X86::sub_32bit);
if (IndexReg != 0)		if (IndexReg != 0)
IndexReg = TRI->getSubReg(IndexReg, X86::sub_32bit);		IndexReg = TRI->getSubReg(IndexReg, X86::sub_32bit);
}		}

MachineInstr *NewMI = nullptr;		MachineInstr *NewMI = nullptr;

		// Case 1.
// Look for lea(%reg1, %reg2), %reg1 or lea(%reg2, %reg1), %reg1		// Look for lea(%reg1, %reg2), %reg1 or lea(%reg2, %reg1), %reg1
// which can be turned into add %reg2, %reg1		// which can be turned into add %reg2, %reg1
if (BaseReg != 0 && IndexReg != 0 && Disp.getImm() == 0 &&		if (BaseReg != 0 && IndexReg != 0 && Disp.getImm() == 0 &&
(DestReg == BaseReg \|\| DestReg == IndexReg)) {		(DestReg == BaseReg \|\| DestReg == IndexReg)) {
unsigned NewOpcode = getADDrrFromLEA(MI.getOpcode());		unsigned NewOpcode = getADDrrFromLEA(MI.getOpcode());
if (DestReg != BaseReg)		if (DestReg != BaseReg)
std::swap(BaseReg, IndexReg);		std::swap(BaseReg, IndexReg);

if (MI.getOpcode() == X86::LEA64_32r) {		if (MI.getOpcode() == X86::LEA64_32r) {
// TODO: Do we need the super register implicit use?		// TODO: Do we need the super register implicit use?
NewMI = BuildMI(MBB, I, MI.getDebugLoc(), TII->get(NewOpcode), DestReg)		NewMI = BuildMI(MBB, I, MI.getDebugLoc(), TII->get(NewOpcode), DestReg)
.addReg(BaseReg).addReg(IndexReg)		.addReg(BaseReg).addReg(IndexReg)
.addReg(Base.getReg(), RegState::Implicit)		.addReg(Base.getReg(), RegState::Implicit)
.addReg(Index.getReg(), RegState::Implicit);		.addReg(Index.getReg(), RegState::Implicit);
} else {		} else {
NewMI = BuildMI(MBB, I, MI.getDebugLoc(), TII->get(NewOpcode), DestReg)		NewMI = BuildMI(MBB, I, MI.getDebugLoc(), TII->get(NewOpcode), DestReg)
.addReg(BaseReg).addReg(IndexReg);		.addReg(BaseReg).addReg(IndexReg);
}		}
} else if (DestReg == BaseReg && IndexReg == 0) {		} else if (DestReg == BaseReg && IndexReg == 0) {
		// Case 2.
// This is an LEA with only a base register and a displacement,		// This is an LEA with only a base register and a displacement,
// We can use ADDri or INC/DEC.		// We can use ADDri or INC/DEC.

// Does this LEA have one these forms:		// Does this LEA have one these forms:
// lea %reg, 1(%reg)		// lea %reg, 1(%reg)
// lea %reg, -1(%reg)		// lea %reg, -1(%reg)
if (OptIncDec && (Disp.getImm() == 1 \|\| Disp.getImm() == -1)) {		if (OptIncDec && (Disp.getImm() == 1 \|\| Disp.getImm() == -1)) {
bool IsINC = Disp.getImm() == 1;		bool IsINC = Disp.getImm() == 1;
Show All 14 Lines	if (OptIncDec && (Disp.getImm() == 1 \|\| Disp.getImm() == -1)) {
NewMI = BuildMI(MBB, I, MI.getDebugLoc(), TII->get(NewOpcode), DestReg)		NewMI = BuildMI(MBB, I, MI.getDebugLoc(), TII->get(NewOpcode), DestReg)
.addReg(BaseReg).addImm(Disp.getImm())		.addReg(BaseReg).addImm(Disp.getImm())
.addReg(Base.getReg(), RegState::Implicit);		.addReg(Base.getReg(), RegState::Implicit);
} else {		} else {
NewMI = BuildMI(MBB, I, MI.getDebugLoc(), TII->get(NewOpcode), DestReg)		NewMI = BuildMI(MBB, I, MI.getDebugLoc(), TII->get(NewOpcode), DestReg)
.addReg(BaseReg).addImm(Disp.getImm());		.addReg(BaseReg).addImm(Disp.getImm());
}		}
}		}
		} else if (BaseReg != 0 && IndexReg != 0 && Disp.getImm() == 0) {
		// Case 3.
		// Look for and transform the sequence
		// lea (reg1, reg2), reg3
		// sub reg3, reg4
		return optLEAALU(I, MBB);
} else		} else
return false;		return false;

MBB.getParent()->substituteDebugValuesForInst(I, NewMI, 1);		MBB.getParent()->substituteDebugValuesForInst(I, NewMI, 1);
MBB.erase(I);		MBB.erase(I);
I = NewMI;		I = NewMI;
return true;		return true;
}		}
▲ Show 20 Lines • Show All 245 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrInfo.h

Show First 20 Lines • Show All 278 Lines • ▼ Show 20 Lines	public:
/// For example, calling this method this way:		/// For example, calling this method this way:
/// unsigned Op1 = 1, Op2 = CommuteAnyOperandIndex;		/// unsigned Op1 = 1, Op2 = CommuteAnyOperandIndex;
/// findCommutedOpIndices(MI, Op1, Op2);		/// findCommutedOpIndices(MI, Op1, Op2);
/// can be interpreted as a query asking to find an operand that would be		/// can be interpreted as a query asking to find an operand that would be
/// commutable with the operand#1.		/// commutable with the operand#1.
bool findCommutedOpIndices(const MachineInstr &MI, unsigned &SrcOpIdx1,		bool findCommutedOpIndices(const MachineInstr &MI, unsigned &SrcOpIdx1,
unsigned &SrcOpIdx2) const override;		unsigned &SrcOpIdx2) const override;

		/// Returns true if we have preference on the operands order in MI, the
		/// commute decision is returned in Commute.
		bool hasCommutePreference(MachineInstr &MI, bool &Commute) const override;

/// Returns an adjusted FMA opcode that must be used in FMA instruction that		/// Returns an adjusted FMA opcode that must be used in FMA instruction that
/// performs the same computations as the given \p MI but which has the		/// performs the same computations as the given \p MI but which has the
/// operands \p SrcOpIdx1 and \p SrcOpIdx2 commuted.		/// operands \p SrcOpIdx1 and \p SrcOpIdx2 commuted.
/// It may return 0 if it is unsafe to commute the operands.		/// It may return 0 if it is unsafe to commute the operands.
/// Note that a machine instruction (instead of its opcode) is passed as the		/// Note that a machine instruction (instead of its opcode) is passed as the
/// first parameter to make it possible to analyze the instruction's uses and		/// first parameter to make it possible to analyze the instruction's uses and
/// commute the first operand of FMA even when it seems unsafe when you look		/// commute the first operand of FMA even when it seems unsafe when you look
/// at the opcode. For example, it is Ok to commute the first operand of		/// at the opcode. For example, it is Ok to commute the first operand of
▲ Show 20 Lines • Show All 340 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,664 Lines • ▼ Show 20 Lines	if (X86II::isKMasked(Desc.TSFlags)) {
return true;		return true;
}		}

return TargetInstrInfo::findCommutedOpIndices(MI, SrcOpIdx1, SrcOpIdx2);		return TargetInstrInfo::findCommutedOpIndices(MI, SrcOpIdx1, SrcOpIdx2);
}		}
return false;		return false;
}		}

		static bool isConvertibleLEA(MachineInstr *MI) {
		unsigned Opcode = MI->getOpcode();
		if (Opcode != X86::LEA32r && Opcode != X86::LEA64r &&
		Opcode != X86::LEA64_32r)
		return false;

		const MachineOperand &Scale = MI->getOperand(1 + X86::AddrScaleAmt);
		const MachineOperand &Disp = MI->getOperand(1 + X86::AddrDisp);
		const MachineOperand &Segment = MI->getOperand(1 + X86::AddrSegmentReg);

		if (Segment.getReg() != 0 \|\| !Disp.isImm() \|\| Disp.getImm() != 0 \|\|
		Scale.getImm() > 1)
		return false;

		return true;
		}

		bool X86InstrInfo::hasCommutePreference(MachineInstr &MI, bool &Commute) const {
		// Currently we're interested in following sequence only.
		// r3 = lea r1, r2
		// r5 = add r3, r4
		// Both r3 and r4 are killed in add, we hope the add instruction has the
		// operand order
		// r5 = add r4, r3
		// So later in X86FixupLEAs the lea instruction can be rewritten as add.
		unsigned Opcode = MI.getOpcode();
		if (Opcode != X86::ADD32rr && Opcode != X86::ADD64rr)
		return false;

		const MachineRegisterInfo &MRI = MI.getParent()->getParent()->getRegInfo();
		Register Reg1 = MI.getOperand(1).getReg();
		Register Reg2 = MI.getOperand(2).getReg();

		// Check if Reg1 comes from LEA in the same MBB.
		if (MachineInstr *Inst = MRI.getUniqueVRegDef(Reg1)) {
		craig.topperUnsubmitted Done Reply Inline Actions I think you might be able to use getUniqueVRegDef which will return the MachineInstr directly craig.topper: I think you might be able to use getUniqueVRegDef which will return the MachineInstr directly
		if (isConvertibleLEA(Inst) && Inst->getParent() == MI.getParent()) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Reduce scope: if (MachineOperand Op = MRI.getOneDef(Reg1)) RKSimon:* Reduce scope: ``` if (MachineOperand *Op = MRI.getOneDef(Reg1)) ```
		Commute = true;
		return true;
		}
		}

		// Check if Reg2 comes from LEA in the same MBB.
		if (MachineInstr *Inst = MRI.getUniqueVRegDef(Reg2)) {
		if (isConvertibleLEA(Inst) && Inst->getParent() == MI.getParent()) {
		Commute = false;
		return true;
		RKSimonUnsubmitted Done Reply Inline Actions Reduce scope: if (MachineOperand Op = MRI.getOneDef(Reg2)) RKSimon:* Reduce scope: ``` if (MachineOperand *Op = MRI.getOneDef(Reg2)) ```
		}
		}

		return false;
		}

X86::CondCode X86::getCondFromBranch(const MachineInstr &MI) {		X86::CondCode X86::getCondFromBranch(const MachineInstr &MI) {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default: return X86::COND_INVALID;		default: return X86::COND_INVALID;
case X86::JCC_1:		case X86::JCC_1:
return static_cast<X86::CondCode>(		return static_cast<X86::CondCode>(
MI.getOperand(MI.getDesc().getNumOperands() - 1).getImm());		MI.getOperand(MI.getDesc().getNumOperands() - 1).getImm());
}		}
}		}
▲ Show 20 Lines • Show All 6,397 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/2009-03-23-MultiUseSched.ll

	Show All 23 Lines
	; CHECK-NEXT: movq X(%rip), %rdx			; CHECK-NEXT: movq X(%rip), %rdx
	; CHECK-NEXT: addq %r15, %rdx			; CHECK-NEXT: addq %r15, %rdx
	; CHECK-NEXT: movq X(%rip), %rsi			; CHECK-NEXT: movq X(%rip), %rsi
	; CHECK-NEXT: bswapq %rsi			; CHECK-NEXT: bswapq %rsi
	; CHECK-NEXT: leaq (%r11,%r14), %rbx			; CHECK-NEXT: leaq (%r11,%r14), %rbx
	; CHECK-NEXT: addq %r15, %rbx			; CHECK-NEXT: addq %r15, %rbx
	; CHECK-NEXT: addq %rdx, %rbx			; CHECK-NEXT: addq %rdx, %rbx
	; CHECK-NEXT: addq %rsi, %rbx			; CHECK-NEXT: addq %rsi, %rbx
	; CHECK-NEXT: leaq (%r9,%r10), %rsi			; CHECK-NEXT: leaq (%r9,%r10), %rdx
	; CHECK-NEXT: leaq (%rsi,%r8), %rdx			; CHECK-NEXT: addq %rdx, %rdx
	; CHECK-NEXT: addq %rsi, %rdx			; CHECK-NEXT: addq %r8, %rdx
	; CHECK-NEXT: movq X(%rip), %rdi			; CHECK-NEXT: movq X(%rip), %rdi
	; CHECK-NEXT: addq %rbx, %r12			; CHECK-NEXT: addq %rbx, %r12
	; CHECK-NEXT: addq %r8, %rdx			; CHECK-NEXT: addq %r8, %rdx
	; CHECK-NEXT: bswapq %rdi			; CHECK-NEXT: bswapq %rdi
	; CHECK-NEXT: addq %rbx, %rdx			; CHECK-NEXT: addq %rbx, %rdx
	; CHECK-NEXT: leaq (%r15,%r14), %rsi			; CHECK-NEXT: leaq (%r15,%r14), %rsi
	; CHECK-NEXT: addq %r12, %rsi			; CHECK-NEXT: addq %r12, %rsi
	; CHECK-NEXT: addq %r11, %rdi			; CHECK-NEXT: addq %r11, %rdi
	; CHECK-NEXT: addq %rsi, %rdi			; CHECK-NEXT: addq %rsi, %rdi
	; CHECK-NEXT: leaq (%r10,%r8), %rbx			; CHECK-NEXT: leaq (%r10,%r8), %rsi
	; CHECK-NEXT: leaq (%rdx,%rbx), %rsi			; CHECK-NEXT: addq %rsi, %rsi
	; CHECK-NEXT: addq %rbx, %rsi			; CHECK-NEXT: addq %rdx, %rsi
	; CHECK-NEXT: movq X(%rip), %rbx			; CHECK-NEXT: movq X(%rip), %rbx
	; CHECK-NEXT: addq %r12, %rdi			; CHECK-NEXT: addq %r12, %rdi
	; CHECK-NEXT: addq %rdi, %r9			; CHECK-NEXT: addq %rdi, %r9
	; CHECK-NEXT: addq %rdx, %rsi			; CHECK-NEXT: addq %rdx, %rsi
	; CHECK-NEXT: addq %rdi, %rsi			; CHECK-NEXT: addq %rdi, %rsi
	; CHECK-NEXT: bswapq %rbx			; CHECK-NEXT: bswapq %rbx
	; CHECK-NEXT: leaq (%r12,%r15), %rdi			; CHECK-NEXT: leaq (%r12,%r15), %rdi
	; CHECK-NEXT: addq %r9, %rdi			; CHECK-NEXT: addq %r9, %rdi
	; CHECK-NEXT: addq %r14, %rbx			; CHECK-NEXT: addq %r14, %rbx
	; CHECK-NEXT: addq %rdi, %rbx			; CHECK-NEXT: addq %rdi, %rbx
	; CHECK-NEXT: leaq (%rdx,%r8), %rax			; CHECK-NEXT: leaq (%rdx,%r8), %rdi
	; CHECK-NEXT: leaq (%rsi,%rax), %rdi			; CHECK-NEXT: addq %rdi, %rdi
	; CHECK-NEXT: addq %rax, %rdi			; CHECK-NEXT: addq %rsi, %rdi
	; CHECK-NEXT: movq X(%rip), %rcx			; CHECK-NEXT: movq X(%rip), %rcx
	; CHECK-NEXT: addq %r9, %rbx			; CHECK-NEXT: addq %r9, %rbx
	; CHECK-NEXT: addq %rbx, %r10			; CHECK-NEXT: addq %rbx, %r10
	; CHECK-NEXT: addq %rsi, %rdi			; CHECK-NEXT: addq %rsi, %rdi
	; CHECK-NEXT: bswapq %rcx			; CHECK-NEXT: bswapq %rcx
	; CHECK-NEXT: addq %rbx, %rdi			; CHECK-NEXT: addq %rbx, %rdi
	; CHECK-NEXT: leaq (%r9,%r12), %rax			; CHECK-NEXT: leaq (%r9,%r12), %rax
	; CHECK-NEXT: addq %r10, %rax			; CHECK-NEXT: addq %r10, %rax
	; CHECK-NEXT: addq %r15, %rcx			; CHECK-NEXT: addq %r15, %rcx
	; CHECK-NEXT: addq %rax, %rcx			; CHECK-NEXT: addq %rax, %rcx
	; CHECK-NEXT: leaq (%rsi,%rdx), %rbx			; CHECK-NEXT: leaq (%rsi,%rdx), %r11
	; CHECK-NEXT: leaq (%rdi,%rbx), %r11			; CHECK-NEXT: addq %r11, %r11
	; CHECK-NEXT: addq %rbx, %r11			; CHECK-NEXT: addq %rdi, %r11
	; CHECK-NEXT: movq X(%rip), %rbx			; CHECK-NEXT: movq X(%rip), %rbx
	; CHECK-NEXT: addq %r10, %rcx			; CHECK-NEXT: addq %r10, %rcx
	; CHECK-NEXT: addq %rcx, %r8			; CHECK-NEXT: addq %rcx, %r8
	; CHECK-NEXT: addq %rdi, %r11			; CHECK-NEXT: addq %rdi, %r11
	; CHECK-NEXT: addq %rcx, %r11			; CHECK-NEXT: addq %rcx, %r11
	; CHECK-NEXT: bswapq %rbx			; CHECK-NEXT: bswapq %rbx
	; CHECK-NEXT: leaq (%r10,%r9), %rcx			; CHECK-NEXT: leaq (%r10,%r9), %rcx
	; CHECK-NEXT: addq %r8, %rcx			; CHECK-NEXT: addq %r8, %rcx
	; CHECK-NEXT: addq %r12, %rbx			; CHECK-NEXT: addq %r12, %rbx
	; CHECK-NEXT: addq %rcx, %rbx			; CHECK-NEXT: addq %rcx, %rbx
	; CHECK-NEXT: leaq (%rdi,%rsi), %rax			; CHECK-NEXT: leaq (%rdi,%rsi), %r14
	; CHECK-NEXT: leaq (%r11,%rax), %r14			; CHECK-NEXT: addq %r14, %r14
	; CHECK-NEXT: addq %rax, %r14			; CHECK-NEXT: addq %r11, %r14
	; CHECK-NEXT: movq X(%rip), %rax			; CHECK-NEXT: movq X(%rip), %rax
	; CHECK-NEXT: addq %r8, %rbx			; CHECK-NEXT: addq %r8, %rbx
	; CHECK-NEXT: addq %rbx, %rdx			; CHECK-NEXT: addq %rbx, %rdx
	; CHECK-NEXT: addq %r11, %r14			; CHECK-NEXT: addq %r11, %r14
	; CHECK-NEXT: bswapq %rax			; CHECK-NEXT: bswapq %rax
	; CHECK-NEXT: addq %rbx, %r14			; CHECK-NEXT: addq %rbx, %r14
	; CHECK-NEXT: leaq (%r8,%r10), %rbx			; CHECK-NEXT: leaq (%r8,%r10), %rbx
	; CHECK-NEXT: addq %rdx, %rbx			; CHECK-NEXT: addq %rdx, %rbx
	; CHECK-NEXT: addq %r9, %rax			; CHECK-NEXT: addq %r9, %rax
	; CHECK-NEXT: addq %rbx, %rax			; CHECK-NEXT: addq %rbx, %rax
	; CHECK-NEXT: leaq (%r11,%rdi), %rbx			; CHECK-NEXT: leaq (%r11,%rdi), %r9
	; CHECK-NEXT: leaq (%r14,%rbx), %r9			; CHECK-NEXT: addq %r9, %r9
	; CHECK-NEXT: addq %rbx, %r9			; CHECK-NEXT: addq %r14, %r9
	; CHECK-NEXT: movq X(%rip), %rbx			; CHECK-NEXT: movq X(%rip), %rbx
	; CHECK-NEXT: addq %rdx, %rax			; CHECK-NEXT: addq %rdx, %rax
	; CHECK-NEXT: addq %rax, %rsi			; CHECK-NEXT: addq %rax, %rsi
	; CHECK-NEXT: addq %r14, %r9			; CHECK-NEXT: addq %r14, %r9
	; CHECK-NEXT: addq %rax, %r9			; CHECK-NEXT: addq %rax, %r9
	; CHECK-NEXT: bswapq %rbx			; CHECK-NEXT: bswapq %rbx
	; CHECK-NEXT: leaq (%rdx,%r8), %rax			; CHECK-NEXT: leaq (%rdx,%r8), %rax
	; CHECK-NEXT: addq %rsi, %rax			; CHECK-NEXT: addq %rsi, %rax
	; CHECK-NEXT: addq %r10, %rbx			; CHECK-NEXT: addq %r10, %rbx
	; CHECK-NEXT: addq %rax, %rbx			; CHECK-NEXT: addq %rax, %rbx
	; CHECK-NEXT: leaq (%r14,%r11), %rax			; CHECK-NEXT: leaq (%r14,%r11), %r10
	; CHECK-NEXT: leaq (%r9,%rax), %r10			; CHECK-NEXT: addq %r10, %r10
	; CHECK-NEXT: addq %rax, %r10			; CHECK-NEXT: addq %r9, %r10
	; CHECK-NEXT: movq X(%rip), %rax			; CHECK-NEXT: movq X(%rip), %rax
	; CHECK-NEXT: addq %rsi, %rbx			; CHECK-NEXT: addq %rsi, %rbx
	; CHECK-NEXT: addq %rbx, %rdi			; CHECK-NEXT: addq %rbx, %rdi
	; CHECK-NEXT: addq %r9, %r10			; CHECK-NEXT: addq %r9, %r10
	; CHECK-NEXT: bswapq %rax			; CHECK-NEXT: bswapq %rax
	; CHECK-NEXT: addq %rbx, %r10			; CHECK-NEXT: addq %rbx, %r10
	; CHECK-NEXT: leaq (%rsi,%rdx), %rbx			; CHECK-NEXT: leaq (%rsi,%rdx), %rbx
	; CHECK-NEXT: addq %rdi, %rbx			; CHECK-NEXT: addq %rdi, %rbx
	; CHECK-NEXT: addq %r8, %rax			; CHECK-NEXT: addq %r8, %rax
	; CHECK-NEXT: addq %rbx, %rax			; CHECK-NEXT: addq %rbx, %rax
	; CHECK-NEXT: leaq (%r9,%r14), %rbx			; CHECK-NEXT: leaq (%r9,%r14), %r8
	; CHECK-NEXT: leaq (%r10,%rbx), %r8			; CHECK-NEXT: addq %r8, %r8
	; CHECK-NEXT: addq %rbx, %r8			; CHECK-NEXT: addq %r10, %r8
	; CHECK-NEXT: movq X(%rip), %rbx			; CHECK-NEXT: movq X(%rip), %rbx
	; CHECK-NEXT: addq %rdi, %rax			; CHECK-NEXT: addq %rdi, %rax
	; CHECK-NEXT: addq %rax, %r11			; CHECK-NEXT: addq %rax, %r11
	; CHECK-NEXT: addq %r10, %r8			; CHECK-NEXT: addq %r10, %r8
	; CHECK-NEXT: addq %rax, %r8			; CHECK-NEXT: addq %rax, %r8
	; CHECK-NEXT: bswapq %rbx			; CHECK-NEXT: bswapq %rbx
	; CHECK-NEXT: leaq (%rdi,%rsi), %rax			; CHECK-NEXT: leaq (%rdi,%rsi), %rax
	; CHECK-NEXT: addq %r11, %rax			; CHECK-NEXT: addq %r11, %rax
	; CHECK-NEXT: addq %rdx, %rbx			; CHECK-NEXT: addq %rdx, %rbx
	; CHECK-NEXT: addq %rax, %rbx			; CHECK-NEXT: addq %rax, %rbx
	; CHECK-NEXT: leaq (%r10,%r9), %rax			; CHECK-NEXT: leaq (%r10,%r9), %r15
	; CHECK-NEXT: leaq (%r8,%rax), %r15			; CHECK-NEXT: addq %r15, %r15
	; CHECK-NEXT: addq %rax, %r15			; CHECK-NEXT: addq %r8, %r15
	; CHECK-NEXT: movq X(%rip), %rax			; CHECK-NEXT: movq X(%rip), %rax
	; CHECK-NEXT: addq %r11, %rbx			; CHECK-NEXT: addq %r11, %rbx
	; CHECK-NEXT: addq %rbx, %r14			; CHECK-NEXT: addq %rbx, %r14
	; CHECK-NEXT: addq %r8, %r15			; CHECK-NEXT: addq %r8, %r15
	; CHECK-NEXT: bswapq %rax			; CHECK-NEXT: bswapq %rax
	; CHECK-NEXT: addq %rbx, %r15			; CHECK-NEXT: addq %rbx, %r15
	; CHECK-NEXT: leaq (%r11,%rdi), %rbx			; CHECK-NEXT: leaq (%r11,%rdi), %rbx
	; CHECK-NEXT: addq %r14, %rbx			; CHECK-NEXT: addq %r14, %rbx
	; CHECK-NEXT: addq %rsi, %rax			; CHECK-NEXT: addq %rsi, %rax
	; CHECK-NEXT: addq %rbx, %rax			; CHECK-NEXT: addq %rbx, %rax
	; CHECK-NEXT: leaq (%r8,%r10), %rbx			; CHECK-NEXT: leaq (%r8,%r10), %rsi
	; CHECK-NEXT: leaq (%r15,%rbx), %rsi			; CHECK-NEXT: addq %rsi, %rsi
	; CHECK-NEXT: addq %rbx, %rsi			; CHECK-NEXT: addq %r15, %rsi
	; CHECK-NEXT: movq X(%rip), %rbx			; CHECK-NEXT: movq X(%rip), %rbx
	; CHECK-NEXT: addq %r14, %rax			; CHECK-NEXT: addq %r14, %rax
	; CHECK-NEXT: addq %rax, %r9			; CHECK-NEXT: addq %rax, %r9
	; CHECK-NEXT: addq %r15, %rsi			; CHECK-NEXT: addq %r15, %rsi
	; CHECK-NEXT: addq %rax, %rsi			; CHECK-NEXT: addq %rax, %rsi
	; CHECK-NEXT: bswapq %rbx			; CHECK-NEXT: bswapq %rbx
	; CHECK-NEXT: leaq (%r14,%r11), %rax			; CHECK-NEXT: leaq (%r14,%r11), %rax
	; CHECK-NEXT: addq %r9, %rax			; CHECK-NEXT: addq %r9, %rax
	; CHECK-NEXT: addq %rdi, %rbx			; CHECK-NEXT: addq %rdi, %rbx
	; CHECK-NEXT: addq %rax, %rbx			; CHECK-NEXT: addq %rax, %rbx
	; CHECK-NEXT: leaq (%r15,%r8), %rax			; CHECK-NEXT: leaq (%r15,%r8), %r12
	; CHECK-NEXT: leaq (%rsi,%rax), %r12			; CHECK-NEXT: addq %r12, %r12
	; CHECK-NEXT: addq %rax, %r12			; CHECK-NEXT: addq %rsi, %r12
	; CHECK-NEXT: movq X(%rip), %rcx			; CHECK-NEXT: movq X(%rip), %rcx
	; CHECK-NEXT: addq %r9, %rbx			; CHECK-NEXT: addq %r9, %rbx
	; CHECK-NEXT: addq %rbx, %r10			; CHECK-NEXT: addq %rbx, %r10
	; CHECK-NEXT: addq %rsi, %r12			; CHECK-NEXT: addq %rsi, %r12
	; CHECK-NEXT: bswapq %rcx			; CHECK-NEXT: bswapq %rcx
	; CHECK-NEXT: addq %rbx, %r12			; CHECK-NEXT: addq %rbx, %r12
	; CHECK-NEXT: leaq (%r9,%r14), %rax			; CHECK-NEXT: leaq (%r9,%r14), %rax
	; CHECK-NEXT: addq %r10, %rax			; CHECK-NEXT: addq %r10, %rax
	; CHECK-NEXT: addq %r11, %rcx			; CHECK-NEXT: addq %r11, %rcx
	; CHECK-NEXT: addq %rax, %rcx			; CHECK-NEXT: addq %rax, %rcx
	; CHECK-NEXT: leaq (%rsi,%r15), %rbx			; CHECK-NEXT: leaq (%rsi,%r15), %rax
	; CHECK-NEXT: leaq (%r12,%rbx), %rax			; CHECK-NEXT: addq %rax, %rax
	; CHECK-NEXT: addq %rbx, %rax			; CHECK-NEXT: addq %r12, %rax
	; CHECK-NEXT: movq X(%rip), %rbx			; CHECK-NEXT: movq X(%rip), %rbx
	; CHECK-NEXT: addq %r10, %rcx			; CHECK-NEXT: addq %r10, %rcx
	; CHECK-NEXT: addq %rcx, %r8			; CHECK-NEXT: addq %rcx, %r8
	; CHECK-NEXT: addq %r12, %rax			; CHECK-NEXT: addq %r12, %rax
	; CHECK-NEXT: addq %rcx, %rax			; CHECK-NEXT: addq %rcx, %rax
	; CHECK-NEXT: bswapq %rbx			; CHECK-NEXT: bswapq %rbx
	; CHECK-NEXT: leaq (%r10,%r9), %rcx			; CHECK-NEXT: leaq (%r10,%r9), %rcx
	; CHECK-NEXT: addq %r8, %rcx			; CHECK-NEXT: addq %r8, %rcx
	; CHECK-NEXT: addq %r14, %rbx			; CHECK-NEXT: addq %r14, %rbx
	; CHECK-NEXT: addq %rcx, %rbx			; CHECK-NEXT: addq %rcx, %rbx
	; CHECK-NEXT: leaq (%r12,%rsi), %rdx			; CHECK-NEXT: leaq (%r12,%rsi), %rcx
	; CHECK-NEXT: leaq (%rax,%rdx), %rcx			; CHECK-NEXT: addq %rcx, %rcx
	; CHECK-NEXT: addq %rdx, %rcx			; CHECK-NEXT: addq %rax, %rcx
	; CHECK-NEXT: movq X(%rip), %rdx			; CHECK-NEXT: movq X(%rip), %rdx
	; CHECK-NEXT: addq %r8, %rbx			; CHECK-NEXT: addq %r8, %rbx
	; CHECK-NEXT: addq %rbx, %r15			; CHECK-NEXT: addq %rbx, %r15
	; CHECK-NEXT: addq %rax, %rcx			; CHECK-NEXT: addq %rax, %rcx
	; CHECK-NEXT: bswapq %rdx			; CHECK-NEXT: bswapq %rdx
	; CHECK-NEXT: addq %rbx, %rcx			; CHECK-NEXT: addq %rbx, %rcx
	; CHECK-NEXT: leaq (%r8,%r10), %rbx			; CHECK-NEXT: leaq (%r8,%r10), %rbx
	; CHECK-NEXT: addq %r15, %rbx			; CHECK-NEXT: addq %r15, %rbx
	; CHECK-NEXT: addq %r9, %rdx			; CHECK-NEXT: addq %r9, %rdx
	; CHECK-NEXT: addq %rbx, %rdx			; CHECK-NEXT: addq %rbx, %rdx
	; CHECK-NEXT: leaq (%rax,%r12), %r9			; CHECK-NEXT: leaq (%rax,%r12), %rbx
	; CHECK-NEXT: leaq (%rcx,%r9), %rbx			; CHECK-NEXT: addq %rbx, %rbx
	; CHECK-NEXT: addq %r9, %rbx			; CHECK-NEXT: addq %rcx, %rbx
	; CHECK-NEXT: addq %r15, %rdx			; CHECK-NEXT: addq %r15, %rdx
	; CHECK-NEXT: addq %rdx, %rsi			; CHECK-NEXT: addq %rdx, %rsi
	; CHECK-NEXT: addq %rcx, %rbx			; CHECK-NEXT: addq %rcx, %rbx
	; CHECK-NEXT: addq %rdx, %rbx			; CHECK-NEXT: addq %rdx, %rbx
	; CHECK-NEXT: movq X(%rip), %rdx			; CHECK-NEXT: movq X(%rip), %rdx
	; CHECK-NEXT: bswapq %rdx			; CHECK-NEXT: bswapq %rdx
	; CHECK-NEXT: addq %r10, %rdx			; CHECK-NEXT: addq %r10, %rdx
	; CHECK-NEXT: leaq (%r15,%r8), %rdi			; CHECK-NEXT: leaq (%r15,%r8), %rdi
	; CHECK-NEXT: addq %rsi, %rdi			; CHECK-NEXT: addq %rsi, %rdi
	; CHECK-NEXT: addq %rdi, %rdx			; CHECK-NEXT: addq %rdi, %rdx
	; CHECK-NEXT: addq %rax, %rcx			; CHECK-NEXT: addq %rax, %rcx
	; CHECK-NEXT: leaq (%rbx,%rcx), %rdi			; CHECK-NEXT: addq %rcx, %rcx
	; CHECK-NEXT: addq %rcx, %rdi			; CHECK-NEXT: addq %rbx, %rcx
	; CHECK-NEXT: addq %rbx, %rdi			; CHECK-NEXT: addq %rbx, %rcx
	; CHECK-NEXT: addq %rsi, %rdx			; CHECK-NEXT: addq %rsi, %rdx
	; CHECK-NEXT: addq %rdx, %r12			; CHECK-NEXT: addq %rdx, %r12
	; CHECK-NEXT: addq %rdx, %rdi			; CHECK-NEXT: addq %rdx, %rcx
	; CHECK-NEXT: addq %r15, %rsi			; CHECK-NEXT: addq %r15, %rsi
	; CHECK-NEXT: movq X(%rip), %rax			; CHECK-NEXT: movq X(%rip), %rax
	; CHECK-NEXT: bswapq %rax			; CHECK-NEXT: bswapq %rax
	; CHECK-NEXT: movq %rax, X(%rip)			; CHECK-NEXT: movq %rax, X(%rip)
	; CHECK-NEXT: addq %r8, %rax			; CHECK-NEXT: addq %r8, %rax
	; CHECK-NEXT: addq %r12, %rsi			; CHECK-NEXT: addq %r12, %rsi
	; CHECK-NEXT: addq %rsi, %rax			; CHECK-NEXT: addq %rsi, %rax
	; CHECK-NEXT: addq %r12, %rax			; CHECK-NEXT: addq %r12, %rax
	; CHECK-NEXT: addq %rdi, %rax			; CHECK-NEXT: addq %rcx, %rax
	; CHECK-NEXT: popq %rbx			; CHECK-NEXT: popq %rbx
	; CHECK-NEXT: popq %r12			; CHECK-NEXT: popq %r12
	; CHECK-NEXT: popq %r14			; CHECK-NEXT: popq %r14
	; CHECK-NEXT: popq %r15			; CHECK-NEXT: popq %r15
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%tmp = load volatile i64, i64* @X ; <i64> [#uses=7]			%tmp = load volatile i64, i64* @X ; <i64> [#uses=7]
	%tmp1 = load volatile i64, i64* @X ; <i64> [#uses=5]			%tmp1 = load volatile i64, i64* @X ; <i64> [#uses=5]
	%tmp2 = load volatile i64, i64* @X ; <i64> [#uses=3]			%tmp2 = load volatile i64, i64* @X ; <i64> [#uses=3]
	▲ Show 20 Lines • Show All 228 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/lea-opt2.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown \| FileCheck %s
				RKSimonUnsubmitted Not Done Reply Inline Actions I think its ok to pre-commit this (with a suitable extra FIXME/TODO explanation comment) and then rebase the patch to show the diffs. RKSimon: I think its ok to pre-commit this (with a suitable extra FIXME/TODO explanation comment) and…
				CarrotAuthorUnsubmitted Done Reply Inline Actions Will do. Carrot: Will do.

	; This file tests following optimization			; This file tests following optimization
	;			;
	; leal (%rdx,%rax), %esi			; leal (%rdx,%rax), %esi
	; subl %esi, %ecx			; subl %esi, %ecx
	;			;
	; can be transformed to			; can be transformed to
	;			;
	; subl %edx, %ecx			; subl %edx, %ecx
	; subl %eax, %ecx			; subl %eax, %ecx

	; TODO: replace lea with sub.
	; C - (A + B) --> C - A - B			; C - (A + B) --> C - A - B
	define i32 @test1(i32* %p, i32 %a, i32 %b, i32 %c) {			define i32 @test1(i32* %p, i32 %a, i32 %b, i32 %c) {
	; CHECK-LABEL: test1:			; CHECK-LABEL: test1:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: # kill: def $edx killed $edx def $rdx			; CHECK-NEXT: # kill: def $edx killed $edx def $rdx
	; CHECK-NEXT: movl %esi, %eax			; CHECK-NEXT: movl %esi, %eax
	; CHECK-NEXT: leal (%rdx,%rax), %esi			; CHECK-NEXT: subl %edx, %ecx
	; CHECK-NEXT: subl %esi, %ecx			; CHECK-NEXT: subl %eax, %ecx
	; CHECK-NEXT: movl %ecx, (%rdi)			; CHECK-NEXT: movl %ecx, (%rdi)
	; CHECK-NEXT: subl %edx, %eax			; CHECK-NEXT: subl %edx, %eax
	; CHECK-NEXT: # kill: def $eax killed $eax killed $rax			; CHECK-NEXT: # kill: def $eax killed $eax killed $rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%0 = add i32 %b, %a			%0 = add i32 %b, %a
	%sub = sub i32 %c, %0			%sub = sub i32 %c, %0
	store i32 %sub, i32* %p, align 4			store i32 %sub, i32* %p, align 4
	%sub1 = sub i32 %a, %b			%sub1 = sub i32 %a, %b
	ret i32 %sub1			ret i32 %sub1
	}			}

	; TODO: replace lea with add.
	; (A + B) + C --> C + A + B			; (A + B) + C --> C + A + B
	define i32 @test2(i32* %p, i32 %a, i32 %b, i32 %c) {			define i32 @test2(i32* %p, i32 %a, i32 %b, i32 %c) {
	; CHECK-LABEL: test2:			; CHECK-LABEL: test2:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: # kill: def $edx killed $edx def $rdx			; CHECK-NEXT: # kill: def $edx killed $edx def $rdx
	; CHECK-NEXT: movl %esi, %eax			; CHECK-NEXT: movl %esi, %eax
	; CHECK-NEXT: leal (%rax,%rdx), %esi			; CHECK-NEXT: addl %eax, %ecx
	; CHECK-NEXT: addl %ecx, %esi			; CHECK-NEXT: addl %edx, %ecx
	; CHECK-NEXT: movl %esi, (%rdi)			; CHECK-NEXT: movl %ecx, (%rdi)
	; CHECK-NEXT: subl %edx, %eax			; CHECK-NEXT: subl %edx, %eax
	; CHECK-NEXT: # kill: def $eax killed $eax killed $rax			; CHECK-NEXT: # kill: def $eax killed $eax killed $rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%0 = add i32 %a, %b			%0 = add i32 %a, %b
	%1 = add i32 %c, %0			%1 = add i32 %c, %0
	store i32 %1, i32* %p, align 4			store i32 %1, i32* %p, align 4
	%sub1 = sub i32 %a, %b			%sub1 = sub i32 %a, %b
	ret i32 %sub1			ret i32 %sub1
	}			}

	; TODO: replace lea with add.
	; C + (A + B) --> C + A + B			; C + (A + B) --> C + A + B
	define i32 @test3(i32* %p, i32 %a, i32 %b, i32 %c) {			define i32 @test3(i32* %p, i32 %a, i32 %b, i32 %c) {
	; CHECK-LABEL: test3:			; CHECK-LABEL: test3:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: # kill: def $edx killed $edx def $rdx			; CHECK-NEXT: # kill: def $edx killed $edx def $rdx
	; CHECK-NEXT: movl %esi, %eax			; CHECK-NEXT: movl %esi, %eax
	; CHECK-NEXT: leal (%rax,%rdx), %esi			; CHECK-NEXT: addl %eax, %ecx
	; CHECK-NEXT: addl %ecx, %esi			; CHECK-NEXT: addl %edx, %ecx
	; CHECK-NEXT: movl %esi, (%rdi)			; CHECK-NEXT: movl %ecx, (%rdi)
	; CHECK-NEXT: subl %edx, %eax			; CHECK-NEXT: subl %edx, %eax
	; CHECK-NEXT: # kill: def $eax killed $eax killed $rax			; CHECK-NEXT: # kill: def $eax killed $eax killed $rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%0 = add i32 %a, %b			%0 = add i32 %a, %b
	%1 = add i32 %0, %c			%1 = add i32 %0, %c
	store i32 %1, i32* %p, align 4			store i32 %1, i32* %p, align 4
	%sub1 = sub i32 %a, %b			%sub1 = sub i32 %a, %b
	Show All 16 Lines
	entry:			entry:
	%0 = add i32 %b, %a			%0 = add i32 %b, %a
	%sub = sub i32 %0, %c			%sub = sub i32 %0, %c
	store i32 %sub, i32* %p, align 4			store i32 %sub, i32* %p, align 4
	%sub1 = sub i32 %a, %b			%sub1 = sub i32 %a, %b
	ret i32 %sub1			ret i32 %sub1
	}			}

	; TODO: replace lea with sub.
	define i64 @test5(i64* %p, i64 %a, i64 %b, i64 %c) {			define i64 @test5(i64* %p, i64 %a, i64 %b, i64 %c) {
	; CHECK-LABEL: test5:			; CHECK-LABEL: test5:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: movq (%rdi), %rax			; CHECK-NEXT: movq (%rdi), %rax
	; CHECK-NEXT: leaq (%rdx,%rax), %rsi			; CHECK-NEXT: subq %rdx, %rcx
	; CHECK-NEXT: subq %rsi, %rcx			; CHECK-NEXT: subq %rax, %rcx
	; CHECK-NEXT: movq %rcx, (%rdi)			; CHECK-NEXT: movq %rcx, (%rdi)
	; CHECK-NEXT: subq %rdx, %rax			; CHECK-NEXT: subq %rdx, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%ld = load i64, i64* %p, align 8			%ld = load i64, i64* %p, align 8
	%0 = add i64 %b, %ld			%0 = add i64 %b, %ld
	%sub = sub i64 %c, %0			%sub = sub i64 %c, %0
	store i64 %sub, i64* %p, align 8			store i64 %sub, i64* %p, align 8
	%sub1 = sub i64 %ld, %b			%sub1 = sub i64 %ld, %b
	ret i64 %sub1			ret i64 %sub1
	}			}

	; TODO: replace lea with add.
	define i64 @test6(i64* %p, i64 %a, i64 %b, i64 %c) {			define i64 @test6(i64* %p, i64 %a, i64 %b, i64 %c) {
	; CHECK-LABEL: test6:			; CHECK-LABEL: test6:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: movq (%rdi), %rax			; CHECK-NEXT: movq (%rdi), %rax
	; CHECK-NEXT: leaq (%rdx,%rax), %rsi			; CHECK-NEXT: addq %rdx, %rcx
	; CHECK-NEXT: addq %rcx, %rsi			; CHECK-NEXT: addq %rax, %rcx
	; CHECK-NEXT: movq %rsi, (%rdi)			; CHECK-NEXT: movq %rcx, (%rdi)
	; CHECK-NEXT: subq %rdx, %rax			; CHECK-NEXT: subq %rdx, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%ld = load i64, i64* %p, align 8			%ld = load i64, i64* %p, align 8
	%0 = add i64 %b, %ld			%0 = add i64 %b, %ld
	%1 = add i64 %0, %c			%1 = add i64 %0, %c
	store i64 %1, i64* %p, align 8			store i64 %1, i64* %p, align 8
	%sub1 = sub i64 %ld, %b			%sub1 = sub i64 %ld, %b
	ret i64 %sub1			ret i64 %sub1
	}			}

	; TODO: replace lea with add.
	define i64 @test7(i64* %p, i64 %a, i64 %b, i64 %c) {			define i64 @test7(i64* %p, i64 %a, i64 %b, i64 %c) {
	; CHECK-LABEL: test7:			; CHECK-LABEL: test7:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: movq (%rdi), %rax			; CHECK-NEXT: movq (%rdi), %rax
	; CHECK-NEXT: leaq (%rdx,%rax), %rsi			; CHECK-NEXT: addq %rdx, %rcx
	; CHECK-NEXT: addq %rcx, %rsi			; CHECK-NEXT: addq %rax, %rcx
	; CHECK-NEXT: movq %rsi, (%rdi)			; CHECK-NEXT: movq %rcx, (%rdi)
	; CHECK-NEXT: subq %rdx, %rax			; CHECK-NEXT: subq %rdx, %rax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%ld = load i64, i64* %p, align 8			%ld = load i64, i64* %p, align 8
	%0 = add i64 %b, %ld			%0 = add i64 %b, %ld
	%1 = add i64 %c, %0			%1 = add i64 %c, %0
	store i64 %1, i64* %p, align 8			store i64 %1, i64* %p, align 8
	%sub1 = sub i64 %ld, %b			%sub1 = sub i64 %ld, %b
				RKSimonUnsubmitted Not Done Reply Inline Actions are the nsw relevant? RKSimon: are the nsw relevant?
				CarrotAuthorUnsubmitted Done Reply Inline Actions It should not be relevant. Carrot: It should not be relevant.
	ret i64 %sub1			ret i64 %sub1
	}			}

				; The sub instruction generated flags is used by following branch,
				; so it should not be transformed.
				define i64 @test8(i64* %p, i64 %a, i64 %b, i64 %c) {
				; CHECK-LABEL: test8:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movq (%rdi), %rax
				; CHECK-NEXT: leaq (%rdx,%rax), %rsi
				; CHECK-NEXT: subq %rsi, %rcx
				; CHECK-NEXT: ja .LBB7_2
				; CHECK-NEXT: # %bb.1: # %then
				; CHECK-NEXT: movq %rcx, (%rdi)
				; CHECK-NEXT: subq %rdx, %rax
				; CHECK-NEXT: retq
				; CHECK-NEXT: .LBB7_2: # %else
				; CHECK-NEXT: movq $0, (%rdi)
				; CHECK-NEXT: subq %rdx, %rax
				; CHECK-NEXT: retq
				entry:
				%ld = load i64, i64* %p, align 8
				%0 = add i64 %b, %ld
				%sub = sub i64 %c, %0
				%cond = icmp ule i64 %c, %0
				br i1 %cond, label %then, label %else

				then:
				store i64 %sub, i64* %p, align 8
				br label %endif

				else:
				store i64 0, i64* %p, align 8
				br label %endif

				endif:
				%sub1 = sub i64 %ld, %b
				ret i64 %sub1
				}

llvm/test/CodeGen/X86/vp2intersect_multiple_pairs.ll

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; X86-NEXT: kmovw %k0, %edi			; X86-NEXT: kmovw %k0, %edi
	; X86-NEXT: addl %edi, %eax			; X86-NEXT: addl %edi, %eax
	; X86-NEXT: kmovw {{[-0-9]+}}(%e{{[sb]}}p), %k2 # 2-byte Reload			; X86-NEXT: kmovw {{[-0-9]+}}(%e{{[sb]}}p), %k2 # 2-byte Reload
	; X86-NEXT: kmovw {{[-0-9]+}}(%e{{[sb]}}p), %k3 # 2-byte Reload			; X86-NEXT: kmovw {{[-0-9]+}}(%e{{[sb]}}p), %k3 # 2-byte Reload
	; X86-NEXT: kmovw %k2, %edi			; X86-NEXT: kmovw %k2, %edi
	; X86-NEXT: addl %ecx, %edx			; X86-NEXT: addl %ecx, %edx
	; X86-NEXT: kmovw %k1, %ecx			; X86-NEXT: kmovw %k1, %ecx
	; X86-NEXT: addl %edi, %ecx			; X86-NEXT: addl %edi, %ecx
	; X86-NEXT: addl %eax, %ecx			; X86-NEXT: addl %ecx, %eax
	; X86-NEXT: addl %edx, %ecx			; X86-NEXT: addl %edx, %eax
	; X86-NEXT: movw %cx, (%esi)			; X86-NEXT: movw %ax, (%esi)
	; X86-NEXT: leal -8(%ebp), %esp			; X86-NEXT: leal -8(%ebp), %esp
	; X86-NEXT: popl %esi			; X86-NEXT: popl %esi
	; X86-NEXT: popl %edi			; X86-NEXT: popl %edi
	; X86-NEXT: popl %ebp			; X86-NEXT: popl %ebp
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: test:			; X64-LABEL: test:
	; X64: # %bb.0: # %entry			; X64: # %bb.0: # %entry
	Show All 35 Lines
	; X64-NEXT: kmovw {{[-0-9]+}}(%r{{[sb]}}p), %k1 # 2-byte Reload			; X64-NEXT: kmovw {{[-0-9]+}}(%r{{[sb]}}p), %k1 # 2-byte Reload
	; X64-NEXT: kmovw %k0, %esi			; X64-NEXT: kmovw %k0, %esi
	; X64-NEXT: kmovw {{[-0-9]+}}(%r{{[sb]}}p), %k0 # 2-byte Reload			; X64-NEXT: kmovw {{[-0-9]+}}(%r{{[sb]}}p), %k0 # 2-byte Reload
	; X64-NEXT: kmovw {{[-0-9]+}}(%r{{[sb]}}p), %k1 # 2-byte Reload			; X64-NEXT: kmovw {{[-0-9]+}}(%r{{[sb]}}p), %k1 # 2-byte Reload
	; X64-NEXT: kmovw %k0, %edi			; X64-NEXT: kmovw %k0, %edi
	; X64-NEXT: kmovw %k1, %ebx			; X64-NEXT: kmovw %k1, %ebx
	; X64-NEXT: addl %edi, %eax			; X64-NEXT: addl %edi, %eax
	; X64-NEXT: addl %ecx, %edx			; X64-NEXT: addl %ecx, %edx
	; X64-NEXT: leal (%rbx,%rsi), %ecx			; X64-NEXT: addl %ebx, %eax
	; X64-NEXT: addl %eax, %ecx			; X64-NEXT: addl %esi, %eax
	; X64-NEXT: addl %edx, %ecx			; X64-NEXT: addl %edx, %eax
	; X64-NEXT: movw %cx, (%r14)			; X64-NEXT: movw %ax, (%r14)
	; X64-NEXT: leaq -16(%rbp), %rsp			; X64-NEXT: leaq -16(%rbp), %rsp
	; X64-NEXT: popq %rbx			; X64-NEXT: popq %rbx
	; X64-NEXT: popq %r14			; X64-NEXT: popq %r14
	; X64-NEXT: popq %rbp			; X64-NEXT: popq %rbp
	; X64-NEXT: retq			; X64-NEXT: retq
	entry:			entry:
	%0 = call { <16 x i1>, <16 x i1> } @llvm.x86.avx512.vp2intersect.d.512(<16 x i32> %a0, <16 x i32> %b0)			%0 = call { <16 x i1>, <16 x i1> } @llvm.x86.avx512.vp2intersect.d.512(<16 x i32> %a0, <16 x i32> %b0)
	%1 = call { <16 x i1>, <16 x i1> } @llvm.x86.avx512.vp2intersect.d.512(<16 x i32> %a1, <16 x i32> %b1)			%1 = call { <16 x i1>, <16 x i1> } @llvm.x86.avx512.vp2intersect.d.512(<16 x i32> %a1, <16 x i32> %b1)
	Show All 34 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUBClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 349017

llvm/include/llvm/CodeGen/TargetInstrInfo.h

llvm/lib/CodeGen/TwoAddressInstructionPass.cpp

llvm/lib/Target/X86/X86FixupLEAs.cpp

llvm/lib/Target/X86/X86InstrInfo.h

llvm/lib/Target/X86/X86InstrInfo.cpp

llvm/test/CodeGen/X86/2009-03-23-MultiUseSched.ll

llvm/test/CodeGen/X86/lea-opt2.ll

llvm/test/CodeGen/X86/vp2intersect_multiple_pairs.ll

[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB
ClosedPublic