This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
5/5
TargetTransformInfo.h
1/2
TargetTransformInfoImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
CodeGen/
-
ExpandLargeDivRem.cpp
-
TargetPassConfig.cpp
-
Target/
-
AArch64/
-
AArch64TargetTransformInfo.h
-
ARM/
-
ARMTargetTransformInfo.h
-
X86/
-
X86TargetTransformInfo.h
-
X86TargetTransformInfo.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
O0-pipeline.ll
-
O3-pipeline.ll
-
udivmodei5.ll
-
ARM/
-
O3-pipeline.ll
-
udivmodei5.ll
-
X86/
-
O0-pipeline.ll
-
div-rem-pair-recomposition-signed.ll
-
div-rem-pair-recomposition-unsigned.ll
-
i128-sdiv.ll
-
i128-udiv.ll
-
libcall-sret.ll
-
opt-pipeline.ll
-
pr38539.ll
-
udivmodei5.ll

Differential D130076

[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64
ClosedPublic

Authored by mgehre-amd on Jul 19 2022, 4:34 AM.

Download Raw Diff

Details

Reviewers

arsenm
pengfei
FreddyYe

Commits

rG2090e85fee9b: [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64

Summary

This adds the ExpandLargeDivRem [0] to the default pass pipeline.
The limit at which it expands div/rem instructions is configured
via a new TargetTransformInfo hook (default: no expansion)
X86, Arm and AArch64 backends implement this hook to expand div/rem
instructions with more than 128 bits.

[0] https://reviews.llvm.org/D126644

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mgehre-amd created this revision.Jul 19 2022, 4:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2022, 4:34 AM

Herald added subscribers: jsji, hiraditya, kristof.beyls. · View Herald Transcript

mgehre-amd requested review of this revision.Jul 19 2022, 4:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2022, 4:34 AM

Herald added a subscriber: wdng. · View Herald Transcript

Harbormaster completed remote builds in B176224: Diff 445776.Jul 19 2022, 4:34 AM

mgehre-amd mentioned this in D122234: [clang] Link libbitint for large division of _BitInt.Jul 19 2022, 4:36 AM

mgehre-amd edited the summary of this revision. (Show Details)

mgehre-amd mentioned this in D123363: [SelectionDAG] Update emission of udivmodei5 to latest ABI changes.

mgehre-amd mentioned this in D120327: compiler-rt: Add udivmodei5 to builtins and add bitint library.Jul 19 2022, 4:38 AM

mgehre-amd mentioned this in D130079: Revert "[SelectionDAG] Emit calls to __divei4 and friends for division/remainder of large integers".Jul 19 2022, 5:04 AM

mgehre-amd mentioned this in D126644: [llvm/CodeGen] Add ExpandLargeDivRem pass.Jul 28 2022, 3:02 PM

Set limit to 64 bit for ARM and x86_32

Harbormaster completed remote builds in B178160: Diff 448456.Jul 28 2022, 3:35 PM

craig.topper added a parent revision: D126644: [llvm/CodeGen] Add ExpandLargeDivRem pass.Jul 28 2022, 7:07 PM

Thank you for working on this @mgehre-amd!

Pinging for review; I'm hoping we can improve our _BitInt support with these changes (that I'm unqualified to review myself).

Adding some more reviewers for x86 who were helping with prior reviews in this area.

Thanks @aaron.ballman for helping out with the reviewers!

Is this to address @arsenm's comments at https://reviews.llvm.org/D126644#3581469?

In D130076#3728779, @pengfei wrote:

Is this to address @arsenm's comments at https://reviews.llvm.org/D126644#3581469?

Yes, this PR introduces the hooks into TargetTransformInfo so targets can decide when/if the lowering needs to occur.

The tests are too prolix to check the correctness and it's fragile to use so many registers. I suggest we remove those tests and just do run-time time in llvm-test-suite.

Matthias Gehre <matthias.gehre@xilinx.com> mentioned this in rG6d13b80fcb1a: Revert "[SelectionDAG] Emit calls to __divei4 and friends for….Aug 26 2022, 2:53 AM

I created one single source demo test: https://reviews.llvm.org/D132850, which is BitInt(256) divide test. Unit tests should be able to cover the pass and easy to review, @mgehre-amd will it help, WDYT? I can help extend the tests if it's the right direction.

In D130076#3751142, @pengfei wrote:

The tests are too prolix to check the correctness and it's fragile to use so many registers. I suggest we remove those tests and just do run-time time in llvm-test-suite.

Yes, you are right. I will restrict the IR tests to maybe just check for the absence of a call instructions? And at least to check that llc doesn't assert like it used to do.

In D130076#3755383, @FreddyYe wrote:

I created one single source demo test: https://reviews.llvm.org/D132850, which is BitInt(256) divide test. Unit tests should be able to cover the pass and easy to review, @mgehre-amd will it help, WDYT? I can help extend the tests if it's the right direction.

Thank you, this looks like the right approach!

In D130076#3766415, @mgehre-amd wrote:

In D130076#3751142, @pengfei wrote:

The tests are too prolix to check the correctness and it's fragile to use so many registers. I suggest we remove those tests and just do run-time time in llvm-test-suite.

Yes, you are right. I will restrict the IR tests to maybe just check for the absence of a call instructions? And at least to check that llc doesn't assert like it used to do.

Yes, checking no call instructions sounds good.

Simplified tests to only check for NOT: call

Rebased on main

Disable expand-large-div-rem when the second operand of div/rem is a power-of-two constant. For those, the backend has peephole optimizations, see llvm/test/CodeGen/X86/i128-sdiv.ll

Updated new test since rebasing:
- llvm/test/CodeGen/X86/div-rem-pair-recomposition-signed/unsigned.ll, llvm/test/CodeGen/X86/i128-udiv/sdiv.ll: Those tests were checking that call __divti3 was created on X86 for 128 bit divisions, which makes no sense because __divti3 is neither implemented by libgcc nor by compiler-rt on X86 in 32-bit mode.
- Removed test llvm/test/CodeGen/X86/libcall-sret.ll: It was checking that the backend doesn't crash when emitting a 128 bit libcall on 32-bit X86. Now that we don't emit calls to __divti3 on X86 anymore, there is no 128 bit libcall left to reproduce the issue.
- llvm/test/CodeGen/X86/pr38539.ll: Removed X86 section from f(); the comment says that f() is targeted at 64-bit mode- In 32 bit mode, we would now get a urem expansion.

Harbormaster completed remote builds in B185186: Diff 458108.Sep 6 2022, 12:57 AM

I just updated the runtime test(https://reviews.llvm.org/D132850) for some extension, it works correct in my local environment. But the pre-merge build seems to not work correct, and I'm afraid if there are some Endian issues in my code. Anyway, it can be refined later.

LGTM. Thanks @mgehre-amd for your great work to implementing this feature!

This revision is now accepted and ready to land.Sep 6 2022, 1:35 AM

This revision was landed with ongoing or failed builds.Sep 6 2022, 7:32 AM

Closed by commit rG2090e85fee9b: [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64 (authored by mgehre-amd). · Explain Why

This revision was automatically updated to reflect the committed changes.

Matthias Gehre <matthias.gehre@xilinx.com> added a commit: rG2090e85fee9b: [llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64.

melver added a subscriber: melver.Sep 6 2022, 7:49 AM

melver added inline comments.

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

294

unsigned?

https://lab.llvm.org/buildbot/#/builders/13/builds/25460

C:\PROGRA~2\MICROS~3\2019\COMMUN~1\VC\Tools\MSVC\1429~1.301\bin\Hostx64\x64\cl.exe  /nologo /TP -DBUILD_EXAMPLES -DGTEST_HAS_RTTI=0 -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_HAS_EXCEPTIONS=0 -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -IC:\buildbot\mlir-x64-windows-ninja\build\lib\CodeGen -IC:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\CodeGen -IC:\buildbot\mlir-x64-windows-ninja\build\include -IC:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\include /DWIN32 /D_WINDOWS   /WX /Zc:inline /Zc:__cplusplus /Oi /bigobj /permissive- /W4 -wd4141 -wd4146 -wd4244 -wd4267 -wd4291 -wd4351 -wd4456 -wd4457 -wd4458 -wd4459 -wd4503 -wd4624 -wd4722 -wd4100 -wd4127 -wd4512 -wd4505 -wd4610 -wd4510 -wd4702 -wd4245 -wd4706 -wd4310 -wd4701 -wd4703 -wd4389 -wd4611 -wd4805 -wd4204 -wd4577 -wd4091 -wd4592 -wd4319 -wd4709 -wd4324 -w14062 -we4238 /Gw /MD /O2 /Ob2  /EHs-c- /GR- -UNDEBUG -std:c++17 /showIncludes /Folib\CodeGen\CMakeFiles\LLVMCodeGen.dir\LLVMTargetMachine.cpp.obj /Fdlib\CodeGen\CMakeFiles\LLVMCodeGen.dir\LLVMCodeGen.pdb /FS -c C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\CodeGen\LLVMTargetMachine.cpp
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\include\llvm/Analysis/TargetTransformInfoImpl.h(295): error C2220: the following warning is treated as an error
C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\include\llvm/Analysis/TargetTransformInfoImpl.h(295): warning C4305: 'return': truncation from 'llvm::IntegerType::<unnamed-enum-MIN_INT_BITS>' to 'bool'

mgehre-amd added inline comments.Sep 6 2022, 8:08 AM

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
294	Yes, sorry, I'm about to push a fix

Breaks https://lab.llvm.org/buildbot/#/builders/85/builds/10520

******************** TEST 'LLVM :: Transforms/ExpandLargeDivRem/udiv129.ll' FAILED ********************
Script:
--
: 'RUN: at line 2';   /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/opt -S -expand-large-div-rem < /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/Transforms/ExpandLargeDivRem/udiv129.ll | /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/Transforms/ExpandLargeDivRem/udiv129.ll
--
Exit Code: 1

Command Output (stderr):
--
/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/Transforms/ExpandLargeDivRem/udiv129.ll:6:15: error: CHECK-NEXT: expected string not found in input
; CHECK-NEXT: _udiv-special-cases:
              ^
<stdin>:5:19: note: scanning from here
define void @test(i129* %ptr, i129* %out) #0 {
                  ^
<stdin>:7:9: note: possible intended match here
 %res = udiv i129 %a, 3
        ^

Input file: <stdin>
Check file: /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/Transforms/ExpandLargeDivRem/udiv129.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: ; ModuleID = '<stdin>' 
          2: source_filename = "<stdin>" 
          3:  
          4: ; Function Attrs: nounwind 
          5: define void @test(i129* %ptr, i129* %out) #0 { 
next:6'0                       X~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
          6:  %a = load i129, i129* %ptr, align 4 
next:6'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          7:  %res = udiv i129 %a, 3 
next:6'0     ~~~~~~~~~~~~~~~~~~~~~~~~
next:6'1             ?                possible intended match
          8:  store i129 %res, i129* %out, align 4 
next:6'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          9:  ret void 
next:6'0     ~~~~~~~~~~
         10: } 
next:6'0     ~~
         11:  
next:6'0     ~
         12: attributes #0 = { nounwind } 
next:6'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

In D130076#3772580, @vitalybuka wrote:

Breaks https://lab.llvm.org/buildbot/#/builders/85/builds/10520

******************** TEST 'LLVM :: Transforms/ExpandLargeDivRem/udiv129.ll' FAILED ********************
Script:
--
: 'RUN: at line 2';   /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/opt -S -expand-large-div-rem < /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/Transforms/ExpandLargeDivRem/udiv129.ll | /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/bin/FileCheck /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/Transforms/ExpandLargeDivRem/udiv129.ll
--
Exit Code: 1

Command Output (stderr):
--
/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/Transforms/ExpandLargeDivRem/udiv129.ll:6:15: error: CHECK-NEXT: expected string not found in input
; CHECK-NEXT: _udiv-special-cases:
              ^
<stdin>:5:19: note: scanning from here
define void @test(i129* %ptr, i129* %out) #0 {
                  ^
<stdin>:7:9: note: possible intended match here
 %res = udiv i129 %a, 3
        ^

Input file: <stdin>
Check file: /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/Transforms/ExpandLargeDivRem/udiv129.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: ; ModuleID = '<stdin>' 
          2: source_filename = "<stdin>" 
          3:  
          4: ; Function Attrs: nounwind 
          5: define void @test(i129* %ptr, i129* %out) #0 { 
next:6'0                       X~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
          6:  %a = load i129, i129* %ptr, align 4 
next:6'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          7:  %res = udiv i129 %a, 3 
next:6'0     ~~~~~~~~~~~~~~~~~~~~~~~~
next:6'1             ?                possible intended match
          8:  store i129 %res, i129* %out, align 4 
next:6'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          9:  ret void 
next:6'0     ~~~~~~~~~~
         10: } 
next:6'0     ~~
         11:  
next:6'0     ~
         12: attributes #0 = { nounwind } 
next:6'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

Thanks for letting me know, I already pushed a fix.

In D130076#3771515, @FreddyYe wrote:

LGTM. Thanks @mgehre-amd for your great work to implementing this feature!

Agreed, thank you for this work! Do you agree that we can now bump BITINT_MAXWIDTH in Clang above 128 now, or are we still missing support for that (floating-point conversions were also a problem IIRC)?

In D130076#3777630, @aaron.ballman wrote:

In D130076#3771515, @FreddyYe wrote:

LGTM. Thanks @mgehre-amd for your great work to implementing this feature!

Agreed, thank you for this work! Do you agree that we can now bump BITINT_MAXWIDTH in Clang above 128 now, or are we still missing support for that (floating-point conversions were also a problem IIRC)?

Thanks back to everyone who helped by reviewing!
Unfortunately, float-to-big-int and and big-int-to-float conversions still crash in the backend.
I want to take a look at this next.

In D130076#3778557, @mgehre-amd wrote:

In D130076#3777630, @aaron.ballman wrote:

In D130076#3771515, @FreddyYe wrote:

LGTM. Thanks @mgehre-amd for your great work to implementing this feature!

Agreed, thank you for this work! Do you agree that we can now bump BITINT_MAXWIDTH in Clang above 128 now, or are we still missing support for that (floating-point conversions were also a problem IIRC)?

Thanks back to everyone who helped by reviewing!
Unfortunately, float-to-big-int and and big-int-to-float conversions still crash in the backend.
I want to take a look at this next.

Thank you for confirming and the offer to look into fixing up that support as well. It's greatly appreciated!

arsenm added inline comments.Sep 9 2022, 6:13 AM

llvm/include/llvm/Analysis/TargetTransformInfo.h
690–691	TargetTransformInfo isn't really the appropriate place to put something for a lowering decision. TargetLowering would make more sense

arsenm added inline comments.Sep 9 2022, 6:21 AM

llvm/include/llvm/Analysis/TargetTransformInfo.h
690–691	The name here is also misleading. It's not the max legal width, just the maximum codegen supports. 128 is still not really legal in the normal use of the term

mgehre-amd marked 2 inline comments as done.Sep 12 2022, 1:27 AM

mgehre-amd added inline comments.

llvm/include/llvm/Analysis/TargetTransformInfo.h
690–691	Hi @arsenm, thanks for pointing this out! I prepared a PR to move this over to TargetLowering under the name `maxSupportedDivRemBitWidth`. This works with the old pass manager, but I cannot figure out to get a `TargetLowering`/`TargetPassConfig` with the new pass manager. Do you have an idea?

arsenm added inline comments.Sep 12 2022, 5:51 AM

llvm/include/llvm/Analysis/TargetTransformInfo.h
690–691	CodeGen is only using the old pass manager so it's a bit of moot point right now

mgehre-amd marked 2 inline comments as done.Sep 12 2022, 6:30 AM

mgehre-amd added inline comments.

llvm/include/llvm/Analysis/TargetTransformInfo.h
690–691	I opened https://reviews.llvm.org/D133691 to address this.

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

7 lines

TargetTransformInfoImpl.h

4 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

CodeGen/

ExpandLargeDivRem.cpp

43 lines

TargetPassConfig.cpp

1 line

Target/

AArch64/

AArch64TargetTransformInfo.h

2 lines

ARM/

ARMTargetTransformInfo.h

2 lines

X86/

X86TargetTransformInfo.h

1 line

X86TargetTransformInfo.cpp

4 lines

test/

CodeGen/

AArch64/

O0-pipeline.ll

1 line

O3-pipeline.ll

1 line

udivmodei5.ll

44 lines

ARM/

O3-pipeline.ll

1 line

udivmodei5.ll

44 lines

X86/

O0-pipeline.ll

1 line

div-rem-pair-recomposition-signed.ll

97 lines

div-rem-pair-recomposition-unsigned.ll

97 lines

36 lines

72 lines

1 line

20 lines

70 lines

Diff 458169

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 681 Lines • ▼ Show 20 Lines	public:

/// Return true if the target has a unified operation to calculate division		/// Return true if the target has a unified operation to calculate division
/// and remainder. If so, the additional implicit multiplication and		/// and remainder. If so, the additional implicit multiplication and
/// subtraction required to calculate a remainder from division are free. This		/// subtraction required to calculate a remainder from division are free. This
/// can enable more aggressive transformations for division and remainder than		/// can enable more aggressive transformations for division and remainder than
/// would typically be allowed using throughput or size cost models.		/// would typically be allowed using throughput or size cost models.
bool hasDivRemOp(Type *DataType, bool IsSigned) const;		bool hasDivRemOp(Type *DataType, bool IsSigned) const;

		/// Returns the maximum bitwidth of legal div and rem instructions.
		unsigned maxLegalDivRemBitWidth() const;
		arsenmUnsubmitted Done Reply Inline Actions TargetTransformInfo isn't really the appropriate place to put something for a lowering decision. TargetLowering would make more sense arsenm: TargetTransformInfo isn't really the appropriate place to put something for a lowering decision.
		arsenmUnsubmitted Done Reply Inline Actions The name here is also misleading. It's not the max legal width, just the maximum codegen supports. 128 is still not really legal in the normal use of the term arsenm: The name here is also misleading. It's not the max legal width, just the maximum codegen…
		mgehre-amdAuthorUnsubmitted Done Reply Inline Actions Hi @arsenm, thanks for pointing this out! I prepared a PR to move this over to TargetLowering under the name `maxSupportedDivRemBitWidth`. This works with the old pass manager, but I cannot figure out to get a `TargetLowering`/`TargetPassConfig` with the new pass manager. Do you have an idea? mgehre-amd: Hi @arsenm, thanks for pointing this out! I prepared a PR to move this over to TargetLowering…
		arsenmUnsubmitted Not Done Reply Inline Actions CodeGen is only using the old pass manager so it's a bit of moot point right now arsenm: CodeGen is only using the old pass manager so it's a bit of moot point right now
		mgehre-amdAuthorUnsubmitted Done Reply Inline Actions I opened https://reviews.llvm.org/D133691 to address this. mgehre-amd: I opened https://reviews.llvm.org/D133691 to address this.

/// Return true if the given instruction (assumed to be a memory access		/// Return true if the given instruction (assumed to be a memory access
/// instruction) has a volatile variant. If that's the case then we can avoid		/// instruction) has a volatile variant. If that's the case then we can avoid
/// addrspacecast to generic AS for volatile loads/stores. Default		/// addrspacecast to generic AS for volatile loads/stores. Default
/// implementation returns false, which prevents address space inference for		/// implementation returns false, which prevents address space inference for
/// volatile loads/stores.		/// volatile loads/stores.
bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) const;		bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) const;

/// Return true if target doesn't mind addresses in vectors.		/// Return true if target doesn't mind addresses in vectors.
▲ Show 20 Lines • Show All 938 Lines • ▼ Show 20 Lines	virtual bool forceScalarizeMaskedScatter(VectorType *DataType,
Align Alignment) = 0;		Align Alignment) = 0;
virtual bool isLegalMaskedCompressStore(Type *DataType) = 0;		virtual bool isLegalMaskedCompressStore(Type *DataType) = 0;
virtual bool isLegalMaskedExpandLoad(Type *DataType) = 0;		virtual bool isLegalMaskedExpandLoad(Type *DataType) = 0;
virtual bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0,		virtual bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0,
unsigned Opcode1,		unsigned Opcode1,
const SmallBitVector &OpcodeMask) const = 0;		const SmallBitVector &OpcodeMask) const = 0;
virtual bool enableOrderedReductions() = 0;		virtual bool enableOrderedReductions() = 0;
virtual bool hasDivRemOp(Type *DataType, bool IsSigned) = 0;		virtual bool hasDivRemOp(Type *DataType, bool IsSigned) = 0;
		virtual unsigned maxLegalDivRemBitWidth() = 0;
virtual bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) = 0;		virtual bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) = 0;
virtual bool prefersVectorizedAddressing() = 0;		virtual bool prefersVectorizedAddressing() = 0;
virtual InstructionCost getScalingFactorCost(Type Ty, GlobalValue BaseGV,		virtual InstructionCost getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset,		int64_t BaseOffset,
bool HasBaseReg, int64_t Scale,		bool HasBaseReg, int64_t Scale,
unsigned AddrSpace) = 0;		unsigned AddrSpace) = 0;
virtual bool LSRWithInstrQueries() = 0;		virtual bool LSRWithInstrQueries() = 0;
virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;		virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;
▲ Show 20 Lines • Show All 431 Lines • ▼ Show 20 Lines	bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0, unsigned Opcode1,
return Impl.isLegalAltInstr(VecTy, Opcode0, Opcode1, OpcodeMask);		return Impl.isLegalAltInstr(VecTy, Opcode0, Opcode1, OpcodeMask);
}		}
bool enableOrderedReductions() override {		bool enableOrderedReductions() override {
return Impl.enableOrderedReductions();		return Impl.enableOrderedReductions();
}		}
bool hasDivRemOp(Type *DataType, bool IsSigned) override {		bool hasDivRemOp(Type *DataType, bool IsSigned) override {
return Impl.hasDivRemOp(DataType, IsSigned);		return Impl.hasDivRemOp(DataType, IsSigned);
}		}
		unsigned maxLegalDivRemBitWidth() override {
		return Impl.maxLegalDivRemBitWidth();
		}
bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) override {		bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) override {
return Impl.hasVolatileVariant(I, AddrSpace);		return Impl.hasVolatileVariant(I, AddrSpace);
}		}
bool prefersVectorizedAddressing() override {		bool prefersVectorizedAddressing() override {
return Impl.prefersVectorizedAddressing();		return Impl.prefersVectorizedAddressing();
}		}
InstructionCost getScalingFactorCost(Type Ty, GlobalValue BaseGV,		InstructionCost getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
▲ Show 20 Lines • Show All 549 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 285 Lines • ▼ Show 20 Lines	public:
}		}

bool isLegalMaskedExpandLoad(Type *DataType) const { return false; }		bool isLegalMaskedExpandLoad(Type *DataType) const { return false; }

bool enableOrderedReductions() const { return false; }		bool enableOrderedReductions() const { return false; }

bool hasDivRemOp(Type *DataType, bool IsSigned) const { return false; }		bool hasDivRemOp(Type *DataType, bool IsSigned) const { return false; }

		bool maxLegalDivRemBitWidth() const {
		melverUnsubmitted Not Done Reply Inline Actions unsigned? https://lab.llvm.org/buildbot/#/builders/13/builds/25460 C:\PROGRA~2\MICROS~3\2019\COMMUN~1\VC\Tools\MSVC\1429~1.301\bin\Hostx64\x64\cl.exe /nologo /TP -DBUILD_EXAMPLES -DGTEST_HAS_RTTI=0 -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_HAS_EXCEPTIONS=0 -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -IC:\buildbot\mlir-x64-windows-ninja\build\lib\CodeGen -IC:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\CodeGen -IC:\buildbot\mlir-x64-windows-ninja\build\include -IC:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\include /DWIN32 /D_WINDOWS /WX /Zc:inline /Zc:__cplusplus /Oi /bigobj /permissive- /W4 -wd4141 -wd4146 -wd4244 -wd4267 -wd4291 -wd4351 -wd4456 -wd4457 -wd4458 -wd4459 -wd4503 -wd4624 -wd4722 -wd4100 -wd4127 -wd4512 -wd4505 -wd4610 -wd4510 -wd4702 -wd4245 -wd4706 -wd4310 -wd4701 -wd4703 -wd4389 -wd4611 -wd4805 -wd4204 -wd4577 -wd4091 -wd4592 -wd4319 -wd4709 -wd4324 -w14062 -we4238 /Gw /MD /O2 /Ob2 /EHs-c- /GR- -UNDEBUG -std:c++17 /showIncludes /Folib\CodeGen\CMakeFiles\LLVMCodeGen.dir\LLVMTargetMachine.cpp.obj /Fdlib\CodeGen\CMakeFiles\LLVMCodeGen.dir\LLVMCodeGen.pdb /FS -c C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\lib\CodeGen\LLVMTargetMachine.cpp C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\include\llvm/Analysis/TargetTransformInfoImpl.h(295): error C2220: the following warning is treated as an error C:\buildbot\mlir-x64-windows-ninja\llvm-project\llvm\include\llvm/Analysis/TargetTransformInfoImpl.h(295): warning C4305: 'return': truncation from 'llvm::IntegerType::<unnamed-enum-MIN_INT_BITS>' to 'bool' melver: unsigned? https://lab.llvm.org/buildbot/#/builders/13/builds/25460 ``` C…
		mgehre-amdAuthorUnsubmitted Done Reply Inline Actions Yes, sorry, I'm about to push a fix mgehre-amd: Yes, sorry, I'm about to push a fix
		return llvm::IntegerType::MAX_INT_BITS;
		}

bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) const {		bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) const {
return false;		return false;
}		}

bool prefersVectorizedAddressing() const { return true; }		bool prefersVectorizedAddressing() const { return true; }

InstructionCost getScalingFactorCost(Type Ty, GlobalValue BaseGV,		InstructionCost getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
▲ Show 20 Lines • Show All 985 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 445 Lines • ▼ Show 20 Lines
	bool TargetTransformInfo::enableOrderedReductions() const {			bool TargetTransformInfo::enableOrderedReductions() const {
	return TTIImpl->enableOrderedReductions();			return TTIImpl->enableOrderedReductions();
	}			}

	bool TargetTransformInfo::hasDivRemOp(Type *DataType, bool IsSigned) const {			bool TargetTransformInfo::hasDivRemOp(Type *DataType, bool IsSigned) const {
	return TTIImpl->hasDivRemOp(DataType, IsSigned);			return TTIImpl->hasDivRemOp(DataType, IsSigned);
	}			}

				unsigned TargetTransformInfo::maxLegalDivRemBitWidth() const {
				return TTIImpl->maxLegalDivRemBitWidth();
				}

	bool TargetTransformInfo::hasVolatileVariant(Instruction *I,			bool TargetTransformInfo::hasVolatileVariant(Instruction *I,
	unsigned AddrSpace) const {			unsigned AddrSpace) const {
	return TTIImpl->hasVolatileVariant(I, AddrSpace);			return TTIImpl->hasVolatileVariant(I, AddrSpace);
	}			}

	bool TargetTransformInfo::prefersVectorizedAddressing() const {			bool TargetTransformInfo::prefersVectorizedAddressing() const {
	return TTIImpl->prefersVectorizedAddressing();			return TTIImpl->prefersVectorizedAddressing();
	}			}
	▲ Show 20 Lines • Show All 749 Lines • Show Last 20 Lines

llvm/lib/CodeGen/ExpandLargeDivRem.cpp

Show All 12 Lines
// with more than 64 bits.		// with more than 64 bits.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/CodeGen/ExpandLargeDivRem.h"		#include "llvm/CodeGen/ExpandLargeDivRem.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Transforms/Utils/IntegerDivision.h"		#include "llvm/Transforms/Utils/IntegerDivision.h"

using namespace llvm;		using namespace llvm;

static cl::opt<unsigned>		static cl::opt<unsigned>
ExpandDivRemBits("expand-div-rem-bits", cl::Hidden, cl::init(128),		ExpandDivRemBits("expand-div-rem-bits", cl::Hidden,
		cl::init(llvm::IntegerType::MAX_INT_BITS),
cl::desc("div and rem instructions on integers with "		cl::desc("div and rem instructions on integers with "
"more than <N> bits are expanded."));		"more than <N> bits are expanded."));

static bool runImpl(Function &F) {		static bool isConstantPowerOfTwo(llvm::Value *V, bool SignedOp) {
		auto *C = dyn_cast<ConstantInt>(V);
		if (!C)
		return false;

		APInt Val = C->getValue();
		if (SignedOp && Val.isNegative())
		Val = -Val;
		return Val.isPowerOf2();
		}

		static bool isSigned(unsigned int Opcode) {
		return Opcode == Instruction::SDiv \|\| Opcode == Instruction::SRem;
		}

		static bool runImpl(Function &F, const TargetTransformInfo &TTI) {
SmallVector<BinaryOperator *, 4> Replace;		SmallVector<BinaryOperator *, 4> Replace;
bool Modified = false;		bool Modified = false;

		unsigned MaxLegalDivRemBitWidth = TTI.maxLegalDivRemBitWidth();
		if (ExpandDivRemBits != llvm::IntegerType::MAX_INT_BITS)
		MaxLegalDivRemBitWidth = ExpandDivRemBits;

		if (MaxLegalDivRemBitWidth >= llvm::IntegerType::MAX_INT_BITS)
		return false;

for (auto &I : instructions(F)) {		for (auto &I : instructions(F)) {
switch (I.getOpcode()) {		switch (I.getOpcode()) {
case Instruction::UDiv:		case Instruction::UDiv:
case Instruction::SDiv:		case Instruction::SDiv:
case Instruction::URem:		case Instruction::URem:
case Instruction::SRem: {		case Instruction::SRem: {
// TODO: This doesn't handle vectors.		// TODO: This doesn't handle vectors.
auto *IntTy = dyn_cast<IntegerType>(I.getType());		auto *IntTy = dyn_cast<IntegerType>(I.getType());
if (!IntTy \|\| IntTy->getIntegerBitWidth() <= ExpandDivRemBits)		if (!IntTy \|\| IntTy->getIntegerBitWidth() <= MaxLegalDivRemBitWidth)
		continue;

		// The backend has peephole optimizations for powers of two.
		if (isConstantPowerOfTwo(I.getOperand(1), isSigned(I.getOpcode())))
continue;		continue;

Replace.push_back(&cast<BinaryOperator>(I));		Replace.push_back(&cast<BinaryOperator>(I));
Modified = true;		Modified = true;
break;		break;
}		}
default:		default:
break;		break;
Show All 14 Lines	while (!Replace.empty()) {
}		}
}		}

return Modified;		return Modified;
}		}

PreservedAnalyses ExpandLargeDivRemPass::run(Function &F,		PreservedAnalyses ExpandLargeDivRemPass::run(Function &F,
FunctionAnalysisManager &AM) {		FunctionAnalysisManager &AM) {
bool Changed = runImpl(F);		TargetTransformInfo &TTI = AM.getResult<TargetIRAnalysis>(F);
		bool Changed = runImpl(F, TTI);

if (Changed)		if (Changed)
return PreservedAnalyses::none();		return PreservedAnalyses::none();

return PreservedAnalyses::all();		return PreservedAnalyses::all();
}		}

class ExpandLargeDivRemLegacyPass : public FunctionPass {		class ExpandLargeDivRemLegacyPass : public FunctionPass {
public:		public:
static char ID;		static char ID;

ExpandLargeDivRemLegacyPass() : FunctionPass(ID) {		ExpandLargeDivRemLegacyPass() : FunctionPass(ID) {
initializeExpandLargeDivRemLegacyPassPass(*PassRegistry::getPassRegistry());		initializeExpandLargeDivRemLegacyPassPass(*PassRegistry::getPassRegistry());
}		}

bool runOnFunction(Function &F) override { return runImpl(F); }		bool runOnFunction(Function &F) override {
		auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
		return runImpl(F, TTI);
		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
		AU.addRequired<TargetTransformInfoWrapperPass>();
AU.addPreserved<AAResultsWrapperPass>();		AU.addPreserved<AAResultsWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
}		}
};		};

char ExpandLargeDivRemLegacyPass::ID = 0;		char ExpandLargeDivRemLegacyPass::ID = 0;
INITIALIZE_PASS_BEGIN(ExpandLargeDivRemLegacyPass, "expand-large-div-rem",		INITIALIZE_PASS_BEGIN(ExpandLargeDivRemLegacyPass, "expand-large-div-rem",
"Expand large div/rem", false, false)		"Expand large div/rem", false, false)
INITIALIZE_PASS_END(ExpandLargeDivRemLegacyPass, "expand-large-div-rem",		INITIALIZE_PASS_END(ExpandLargeDivRemLegacyPass, "expand-large-div-rem",
"Expand large div/rem", false, false)		"Expand large div/rem", false, false)

FunctionPass *llvm::createExpandLargeDivRemPass() {		FunctionPass *llvm::createExpandLargeDivRemPass() {
return new ExpandLargeDivRemLegacyPass();		return new ExpandLargeDivRemLegacyPass();
}		}

llvm/lib/CodeGen/TargetPassConfig.cpp

	Show First 20 Lines • Show All 1,107 Lines • ▼ Show 20 Lines
	}			}

	bool TargetPassConfig::addISelPasses() {			bool TargetPassConfig::addISelPasses() {
	if (TM->useEmulatedTLS())			if (TM->useEmulatedTLS())
	addPass(createLowerEmuTLSPass());			addPass(createLowerEmuTLSPass());

	addPass(createPreISelIntrinsicLoweringPass());			addPass(createPreISelIntrinsicLoweringPass());
	PM->add(createTargetTransformInfoWrapperPass(TM->getTargetIRAnalysis()));			PM->add(createTargetTransformInfoWrapperPass(TM->getTargetIRAnalysis()));
				addPass(createExpandLargeDivRemPass());
	addIRPasses();			addIRPasses();
	addCodeGenPrepare();			addCodeGenPrepare();
	addPassesToHandleExceptions();			addPassesToHandleExceptions();
	addISelPrepare();			addISelPrepare();

	return addCoreISelPasses();			return addCoreISelPasses();
	}			}

	▲ Show 20 Lines • Show All 454 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 313 Lines • ▼ Show 20 Lines	if (auto *DataTypeVTy = dyn_cast<VectorType>(DataType)) {
return NumElements > 1 && isPowerOf2_64(NumElements) && EltSize >= 8 &&		return NumElements > 1 && isPowerOf2_64(NumElements) && EltSize >= 8 &&
EltSize <= 128 && isPowerOf2_64(EltSize);		EltSize <= 128 && isPowerOf2_64(EltSize);
}		}
return BaseT::isLegalNTStore(DataType, Alignment);		return BaseT::isLegalNTStore(DataType, Alignment);
}		}

bool enableOrderedReductions() const { return true; }		bool enableOrderedReductions() const { return true; }

		unsigned maxLegalDivRemBitWidth() const { return 128; }

InstructionCost getInterleavedMemoryOpCost(		InstructionCost getInterleavedMemoryOpCost(
unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,		unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,		Align Alignment, unsigned AddressSpace, TTI::TargetCostKind CostKind,
bool UseMaskForCond = false, bool UseMaskForGaps = false);		bool UseMaskForCond = false, bool UseMaskForGaps = false);

bool		bool
shouldConsiderAddressTypePromotion(const Instruction &I,		shouldConsiderAddressTypePromotion(const Instruction &I,
bool &AllowPromotionWithoutCommonHeader);		bool &AllowPromotionWithoutCommonHeader);
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

Show First 20 Lines • Show All 201 Lines • ▼ Show 20 Lines	public:
}		}

bool isLegalMaskedGather(Type *Ty, Align Alignment);		bool isLegalMaskedGather(Type *Ty, Align Alignment);

bool isLegalMaskedScatter(Type *Ty, Align Alignment) {		bool isLegalMaskedScatter(Type *Ty, Align Alignment) {
return isLegalMaskedGather(Ty, Alignment);		return isLegalMaskedGather(Ty, Alignment);
}		}

		unsigned maxLegalDivRemBitWidth() const { return 64; }

InstructionCost getMemcpyCost(const Instruction *I);		InstructionCost getMemcpyCost(const Instruction *I);

int getNumMemOps(const IntrinsicInst *I) const;		int getNumMemOps(const IntrinsicInst *I) const;

InstructionCost getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp,		InstructionCost getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp,
ArrayRef<int> Mask,		ArrayRef<int> Mask,
TTI::TargetCostKind CostKind, int Index,		TTI::TargetCostKind CostKind, int Index,
VectorType *SubTp,		VectorType *SubTp,
▲ Show 20 Lines • Show All 143 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines	public:
bool isLegalMaskedGather(Type *DataType, Align Alignment);		bool isLegalMaskedGather(Type *DataType, Align Alignment);
bool isLegalMaskedScatter(Type *DataType, Align Alignment);		bool isLegalMaskedScatter(Type *DataType, Align Alignment);
bool isLegalMaskedExpandLoad(Type *DataType);		bool isLegalMaskedExpandLoad(Type *DataType);
bool isLegalMaskedCompressStore(Type *DataType);		bool isLegalMaskedCompressStore(Type *DataType);
bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0, unsigned Opcode1,		bool isLegalAltInstr(VectorType *VecTy, unsigned Opcode0, unsigned Opcode1,
const SmallBitVector &OpcodeMask) const;		const SmallBitVector &OpcodeMask) const;
bool hasDivRemOp(Type *DataType, bool IsSigned);		bool hasDivRemOp(Type *DataType, bool IsSigned);
bool isExpensiveToSpeculativelyExecute(const Instruction *I);		bool isExpensiveToSpeculativelyExecute(const Instruction *I);
		unsigned maxLegalDivRemBitWidth() const;
bool isFCmpOrdCheaperThanFCmpZero(Type *Ty);		bool isFCmpOrdCheaperThanFCmpZero(Type *Ty);
bool areInlineCompatible(const Function *Caller,		bool areInlineCompatible(const Function *Caller,
const Function *Callee) const;		const Function *Callee) const;
bool areTypesABICompatible(const Function Caller, const Function Callee,		bool areTypesABICompatible(const Function Caller, const Function Callee,
const ArrayRef<Type *> &Type) const;		const ArrayRef<Type *> &Type) const;
TTI::MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,		TTI::MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,
bool IsZeroCmp) const;		bool IsZeroCmp) const;
bool prefersVectorizedAddressing() const;		bool prefersVectorizedAddressing() const;
Show All 21 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,728 Lines • ▼ Show 20 Lines	bool X86TTIImpl::isExpensiveToSpeculativelyExecute(const Instruction* I) {
// FDIV is always expensive, even if it has a very low uop count.		// FDIV is always expensive, even if it has a very low uop count.
// TODO: Still necessary for recent CPUs with low latency/throughput fdiv?		// TODO: Still necessary for recent CPUs with low latency/throughput fdiv?
if (I->getOpcode() == Instruction::FDiv)		if (I->getOpcode() == Instruction::FDiv)
return true;		return true;

return BaseT::isExpensiveToSpeculativelyExecute(I);		return BaseT::isExpensiveToSpeculativelyExecute(I);
}		}

		unsigned X86TTIImpl::maxLegalDivRemBitWidth() const {
		return ST->is64Bit() ? 128 : 64;
		}

bool X86TTIImpl::isFCmpOrdCheaperThanFCmpZero(Type *Ty) {		bool X86TTIImpl::isFCmpOrdCheaperThanFCmpZero(Type *Ty) {
return false;		return false;
}		}

bool X86TTIImpl::areInlineCompatible(const Function *Caller,		bool X86TTIImpl::areInlineCompatible(const Function *Caller,
const Function *Callee) const {		const Function *Callee) const {
const TargetMachine &TM = getTLI()->getTargetMachine();		const TargetMachine &TM = getTLI()->getTargetMachine();

▲ Show 20 Lines • Show All 614 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/O0-pipeline.ll

	Show All 9 Lines
	; CHECK-NEXT: Target Transform Information			; CHECK-NEXT: Target Transform Information
	; CHECK-NEXT: Create Garbage Collector Module Metadata			; CHECK-NEXT: Create Garbage Collector Module Metadata
	; CHECK-NEXT: Assumption Cache Tracker			; CHECK-NEXT: Assumption Cache Tracker
	; CHECK-NEXT: Profile summary info			; CHECK-NEXT: Profile summary info
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
				; CHECK-NEXT: Expand large div/rem
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Lower Garbage Collection Instructions			; CHECK-NEXT: Lower Garbage Collection Instructions
	; CHECK-NEXT: Shadow Stack GC Lowering			; CHECK-NEXT: Shadow Stack GC Lowering
	; CHECK-NEXT: Lower constant intrinsics			; CHECK-NEXT: Lower constant intrinsics
	; CHECK-NEXT: Remove unreachable blocks from the CFG			; CHECK-NEXT: Remove unreachable blocks from the CFG
	; CHECK-NEXT: Expand vector predication intrinsics			; CHECK-NEXT: Expand vector predication intrinsics
	; CHECK-NEXT: Scalarize Masked Memory Intrinsics			; CHECK-NEXT: Scalarize Masked Memory Intrinsics
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/O3-pipeline.ll

	Show All 12 Lines
	; CHECK-NEXT: Type-Based Alias Analysis			; CHECK-NEXT: Type-Based Alias Analysis
	; CHECK-NEXT: Scoped NoAlias Alias Analysis			; CHECK-NEXT: Scoped NoAlias Alias Analysis
	; CHECK-NEXT: Create Garbage Collector Module Metadata			; CHECK-NEXT: Create Garbage Collector Module Metadata
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: Default Regalloc Eviction Advisor			; CHECK-NEXT: Default Regalloc Eviction Advisor
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
				; CHECK-NEXT: Expand large div/rem
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: SVE intrinsics optimizations			; CHECK-NEXT: SVE intrinsics optimizations
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	▲ Show 20 Lines • Show All 220 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/udivmodei5.ll

This file was added.

				; RUN: llc -mtriple=aarch64-linux-gnuabi < %s \| FileCheck %s

				define i65 @udiv65(i65 %a, i65 %b) nounwind {
				; CHECK-LABEL: udiv65:
				; CHECK-NOT: call
				%res = udiv i65 %a, %b
				ret i65 %res
				}

				define i129 @udiv129(i129 %a, i129 %b) nounwind {
				; CHECK-LABEL: udiv129:
				; CHECK-NOT: call
				%res = udiv i129 %a, %b
				ret i129 %res
				}

				define i129 @urem129(i129 %a, i129 %b) nounwind {
				; CHECK-LABEL: urem129:
				; CHECK-NOT: call
				%res = urem i129 %a, %b
				ret i129 %res
				}

				define i129 @sdiv129(i129 %a, i129 %b) nounwind {
				; CHECK-LABEL: sdiv129:
				; CHECK-NOT: call
				%res = sdiv i129 %a, %b
				ret i129 %res
				}

				define i129 @srem129(i129 %a, i129 %b) nounwind {
				; CHECK-LABEL: srem129:
				; CHECK-NOT: call
				%res = srem i129 %a, %b
				ret i129 %res
				}

				; Some higher sizes
				define i257 @sdiv257(i257 %a, i257 %b) nounwind {
				; CHECK-LABEL: sdiv257:
				; CHECK-NOT: call
				%res = sdiv i257 %a, %b
				ret i257 %res
				}

llvm/test/CodeGen/ARM/O3-pipeline.ll

	; RUN: llc -mtriple=arm -O3 -debug-pass=Structure < %s -o /dev/null 2>&1 \| grep -v "Verify generated machine code" \| FileCheck %s			; RUN: llc -mtriple=arm -O3 -debug-pass=Structure < %s -o /dev/null 2>&1 \| grep -v "Verify generated machine code" \| FileCheck %s

	; REQUIRES: asserts			; REQUIRES: asserts

	; CHECK: ModulePass Manager			; CHECK: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
				; CHECK-NEXT: Expand large div/rem
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: Simplify the CFG			; CHECK-NEXT: Simplify the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: MVE gather/scatter lowering			; CHECK-NEXT: MVE gather/scatter lowering
	; CHECK-NEXT: MVE lane interleaving			; CHECK-NEXT: MVE lane interleaving
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	▲ Show 20 Lines • Show All 189 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/udivmodei5.ll

This file was added.

				; RUN: llc -mtriple=arm-eabi < %s \| FileCheck %s

				define i65 @udiv65(i65 %a, i65 %b) nounwind {
				; CHECK-LABEL: udiv65:
				; CHECK-NOT: call
				%res = udiv i65 %a, %b
				ret i65 %res
				}

				define i129 @udiv129(i129 %a, i129 %b) nounwind {
				; CHECK-LABEL: udiv129:
				; CHECK-NOT: call
				%res = udiv i129 %a, %b
				ret i129 %res
				}

				define i129 @urem129(i129 %a, i129 %b) nounwind {
				; CHECK-LABEL: urem129:
				; CHECK-NOT: call
				%res = urem i129 %a, %b
				ret i129 %res
				}

				define i129 @sdiv129(i129 %a, i129 %b) nounwind {
				; CHECK-LABEL: sdiv129:
				; CHECK-NOT: call
				%res = sdiv i129 %a, %b
				ret i129 %res
				}

				define i129 @srem129(i129 %a, i129 %b) nounwind {
				; CHECK-LABEL: srem129:
				; CHECK-NOT: call
				%res = srem i129 %a, %b
				ret i129 %res
				}

				; Some higher sizes
				define i257 @sdiv257(i257 %a, i257 %b) nounwind {
				; CHECK-LABEL: sdiv257:
				; CHECK-NOT: call
				%res = sdiv i257 %a, %b
				ret i257 %res
				}

llvm/test/CodeGen/X86/O0-pipeline.ll

	Show All 11 Lines
	; CHECK-NEXT: Target Transform Information			; CHECK-NEXT: Target Transform Information
	; CHECK-NEXT: Create Garbage Collector Module Metadata			; CHECK-NEXT: Create Garbage Collector Module Metadata
	; CHECK-NEXT: Assumption Cache Tracker			; CHECK-NEXT: Assumption Cache Tracker
	; CHECK-NEXT: Profile summary info			; CHECK-NEXT: Profile summary info
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
				; CHECK-NEXT: Expand large div/rem
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: Lower AMX intrinsics			; CHECK-NEXT: Lower AMX intrinsics
	; CHECK-NEXT: Lower AMX type for load/store			; CHECK-NEXT: Lower AMX type for load/store
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Lower Garbage Collection Instructions			; CHECK-NEXT: Lower Garbage Collection Instructions
	; CHECK-NEXT: Shadow Stack GC Lowering			; CHECK-NEXT: Shadow Stack GC Lowering
	; CHECK-NEXT: Lower constant intrinsics			; CHECK-NEXT: Lower constant intrinsics
	; CHECK-NEXT: Remove unreachable blocks from the CFG			; CHECK-NEXT: Remove unreachable blocks from the CFG
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/div-rem-pair-recomposition-signed.ll

Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
store i64 %div, ptr %divdst, align 4		store i64 %div, ptr %divdst, align 4
%t1 = mul i64 %div, %y		%t1 = mul i64 %div, %y
%t2 = sub i64 %x, %t1		%t2 = sub i64 %x, %t1
ret i64 %t2		ret i64 %t2
}		}

define i128 @scalar_i128(i128 %x, i128 %y, ptr %divdst) nounwind {		define i128 @scalar_i128(i128 %x, i128 %y, ptr %divdst) nounwind {
; X86-LABEL: scalar_i128:		; X86-LABEL: scalar_i128:
; X86: # %bb.0:		; X86 doesn't have __divti3, so the urem is expanded into a loop.
; X86-NEXT: pushl %ebp		; X86: udiv-do-while
; X86-NEXT: movl %esp, %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
; X86-NEXT: andl $-8, %esp
; X86-NEXT: subl $40, %esp
; X86-NEXT: movl 44(%ebp), %edi
; X86-NEXT: leal {{[0-9]+}}(%esp), %eax
; X86-NEXT: pushl 40(%ebp)
; X86-NEXT: pushl 36(%ebp)
; X86-NEXT: pushl 32(%ebp)
; X86-NEXT: pushl 28(%ebp)
; X86-NEXT: pushl 24(%ebp)
; X86-NEXT: pushl 20(%ebp)
; X86-NEXT: pushl 16(%ebp)
; X86-NEXT: pushl 12(%ebp)
; X86-NEXT: pushl %eax
; X86-NEXT: calll __divti3
; X86-NEXT: addl $32, %esp
; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl %edi, %edx
; X86-NEXT: movl %ecx, 12(%edi)
; X86-NEXT: movl %esi, 8(%edi)
; X86-NEXT: movl %eax, 4(%edi)
; X86-NEXT: movl %eax, %edi
; X86-NEXT: movl %ebx, (%edx)
; X86-NEXT: movl 28(%ebp), %eax
; X86-NEXT: imull %eax, %ecx
; X86-NEXT: mull %esi
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: addl %ecx, %edx
; X86-NEXT: imull 32(%ebp), %esi
; X86-NEXT: addl %edx, %esi
; X86-NEXT: movl 36(%ebp), %eax
; X86-NEXT: movl %eax, %ecx
; X86-NEXT: imull %edi, %ecx
; X86-NEXT: mull %ebx
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: addl %ecx, %edx
; X86-NEXT: movl 40(%ebp), %eax
; X86-NEXT: imull %ebx, %eax
; X86-NEXT: addl %edx, %eax
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
; X86-NEXT: addl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Spill
; X86-NEXT: adcl %esi, %eax
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: movl %ebx, %eax
; X86-NEXT: movl 28(%ebp), %ecx
; X86-NEXT: mull %ecx
; X86-NEXT: movl %edx, (%esp) # 4-byte Spill
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: movl %edi, %eax
; X86-NEXT: mull %ecx
; X86-NEXT: movl %edx, %ecx
; X86-NEXT: movl %eax, %esi
; X86-NEXT: addl (%esp), %esi # 4-byte Folded Reload
; X86-NEXT: adcl $0, %ecx
; X86-NEXT: movl %ebx, %eax
; X86-NEXT: mull 32(%ebp)
; X86-NEXT: movl %edx, %ebx
; X86-NEXT: addl %esi, %eax
; X86-NEXT: movl %eax, (%esp) # 4-byte Spill
; X86-NEXT: adcl %ecx, %ebx
; X86-NEXT: setb %cl
; X86-NEXT: movl %edi, %eax
; X86-NEXT: mull 32(%ebp)
; X86-NEXT: addl %ebx, %eax
; X86-NEXT: movzbl %cl, %ecx
; X86-NEXT: adcl %ecx, %edx
; X86-NEXT: addl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
; X86-NEXT: adcl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Folded Reload
; X86-NEXT: movl 12(%ebp), %ecx
; X86-NEXT: subl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload
; X86-NEXT: movl 16(%ebp), %esi
; X86-NEXT: sbbl (%esp), %esi # 4-byte Folded Reload
; X86-NEXT: movl 20(%ebp), %edi
; X86-NEXT: sbbl %eax, %edi
; X86-NEXT: movl 24(%ebp), %ebx
; X86-NEXT: sbbl %edx, %ebx
; X86-NEXT: movl 8(%ebp), %eax
; X86-NEXT: movl %ecx, (%eax)
; X86-NEXT: movl %esi, 4(%eax)
; X86-NEXT: movl %edi, 8(%eax)
; X86-NEXT: movl %ebx, 12(%eax)
; X86-NEXT: leal -12(%ebp), %esp
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
; X86-NEXT: popl %ebp
; X86-NEXT: retl $4
;		;
; X64-LABEL: scalar_i128:		; X64-LABEL: scalar_i128:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: pushq %r15		; X64-NEXT: pushq %r15
; X64-NEXT: pushq %r14		; X64-NEXT: pushq %r14
; X64-NEXT: pushq %r13		; X64-NEXT: pushq %r13
; X64-NEXT: pushq %r12		; X64-NEXT: pushq %r12
; X64-NEXT: pushq %rbx		; X64-NEXT: pushq %rbx
▲ Show 20 Lines • Show All 843 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/div-rem-pair-recomposition-unsigned.ll

Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
store i64 %div, ptr %divdst, align 4		store i64 %div, ptr %divdst, align 4
%t1 = mul i64 %div, %y		%t1 = mul i64 %div, %y
%t2 = sub i64 %x, %t1		%t2 = sub i64 %x, %t1
ret i64 %t2		ret i64 %t2
}		}

define i128 @scalar_i128(i128 %x, i128 %y, ptr %divdst) nounwind {		define i128 @scalar_i128(i128 %x, i128 %y, ptr %divdst) nounwind {
; X86-LABEL: scalar_i128:		; X86-LABEL: scalar_i128:
; X86: # %bb.0:		; X86 doesn't have __divti3, so the urem is expanded into a loop.
; X86-NEXT: pushl %ebp		; X86: udiv-do-while
; X86-NEXT: movl %esp, %ebp
; X86-NEXT: pushl %ebx
; X86-NEXT: pushl %edi
; X86-NEXT: pushl %esi
; X86-NEXT: andl $-8, %esp
; X86-NEXT: subl $40, %esp
; X86-NEXT: movl 44(%ebp), %edi
; X86-NEXT: leal {{[0-9]+}}(%esp), %eax
; X86-NEXT: pushl 40(%ebp)
; X86-NEXT: pushl 36(%ebp)
; X86-NEXT: pushl 32(%ebp)
; X86-NEXT: pushl 28(%ebp)
; X86-NEXT: pushl 24(%ebp)
; X86-NEXT: pushl 20(%ebp)
; X86-NEXT: pushl 16(%ebp)
; X86-NEXT: pushl 12(%ebp)
; X86-NEXT: pushl %eax
; X86-NEXT: calll __udivti3
; X86-NEXT: addl $32, %esp
; X86-NEXT: movl {{[0-9]+}}(%esp), %ebx
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %esi
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: movl %edi, %edx
; X86-NEXT: movl %ecx, 12(%edi)
; X86-NEXT: movl %esi, 8(%edi)
; X86-NEXT: movl %eax, 4(%edi)
; X86-NEXT: movl %eax, %edi
; X86-NEXT: movl %ebx, (%edx)
; X86-NEXT: movl 28(%ebp), %eax
; X86-NEXT: imull %eax, %ecx
; X86-NEXT: mull %esi
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: addl %ecx, %edx
; X86-NEXT: imull 32(%ebp), %esi
; X86-NEXT: addl %edx, %esi
; X86-NEXT: movl 36(%ebp), %eax
; X86-NEXT: movl %eax, %ecx
; X86-NEXT: imull %edi, %ecx
; X86-NEXT: mull %ebx
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: addl %ecx, %edx
; X86-NEXT: movl 40(%ebp), %eax
; X86-NEXT: imull %ebx, %eax
; X86-NEXT: addl %edx, %eax
; X86-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Reload
; X86-NEXT: addl %ecx, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Folded Spill
; X86-NEXT: adcl %esi, %eax
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: movl %ebx, %eax
; X86-NEXT: movl 28(%ebp), %ecx
; X86-NEXT: mull %ecx
; X86-NEXT: movl %edx, (%esp) # 4-byte Spill
; X86-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
; X86-NEXT: movl %edi, %eax
; X86-NEXT: mull %ecx
; X86-NEXT: movl %edx, %ecx
; X86-NEXT: movl %eax, %esi
; X86-NEXT: addl (%esp), %esi # 4-byte Folded Reload
; X86-NEXT: adcl $0, %ecx
; X86-NEXT: movl %ebx, %eax
; X86-NEXT: mull 32(%ebp)
; X86-NEXT: movl %edx, %ebx
; X86-NEXT: addl %esi, %eax
; X86-NEXT: movl %eax, (%esp) # 4-byte Spill
; X86-NEXT: adcl %ecx, %ebx
; X86-NEXT: setb %cl
; X86-NEXT: movl %edi, %eax
; X86-NEXT: mull 32(%ebp)
; X86-NEXT: addl %ebx, %eax
; X86-NEXT: movzbl %cl, %ecx
; X86-NEXT: adcl %ecx, %edx
; X86-NEXT: addl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Folded Reload
; X86-NEXT: adcl {{[-0-9]+}}(%e{{[sb]}}p), %edx # 4-byte Folded Reload
; X86-NEXT: movl 12(%ebp), %ecx
; X86-NEXT: subl {{[-0-9]+}}(%e{{[sb]}}p), %ecx # 4-byte Folded Reload
; X86-NEXT: movl 16(%ebp), %esi
; X86-NEXT: sbbl (%esp), %esi # 4-byte Folded Reload
; X86-NEXT: movl 20(%ebp), %edi
; X86-NEXT: sbbl %eax, %edi
; X86-NEXT: movl 24(%ebp), %ebx
; X86-NEXT: sbbl %edx, %ebx
; X86-NEXT: movl 8(%ebp), %eax
; X86-NEXT: movl %ecx, (%eax)
; X86-NEXT: movl %esi, 4(%eax)
; X86-NEXT: movl %edi, 8(%eax)
; X86-NEXT: movl %ebx, 12(%eax)
; X86-NEXT: leal -12(%ebp), %esp
; X86-NEXT: popl %esi
; X86-NEXT: popl %edi
; X86-NEXT: popl %ebx
; X86-NEXT: popl %ebp
; X86-NEXT: retl $4
;		;
; X64-LABEL: scalar_i128:		; X64-LABEL: scalar_i128:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: pushq %r15		; X64-NEXT: pushq %r15
; X64-NEXT: pushq %r14		; X64-NEXT: pushq %r14
; X64-NEXT: pushq %r13		; X64-NEXT: pushq %r13
; X64-NEXT: pushq %r12		; X64-NEXT: pushq %r12
; X64-NEXT: pushq %rbx		; X64-NEXT: pushq %rbx
▲ Show 20 Lines • Show All 843 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/i128-sdiv.ll

	Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	; X64-NEXT: sbbq %rcx, %rdx			; X64-NEXT: sbbq %rcx, %rdx
	; X64-NEXT: retq			; X64-NEXT: retq
	%tmp = sdiv i128 %x, -73786976294838206464			%tmp = sdiv i128 %x, -73786976294838206464
	ret i128 %tmp			ret i128 %tmp
	}			}

	define i128 @test3(i128 %x) nounwind {			define i128 @test3(i128 %x) nounwind {
	; X86-LABEL: test3:			; X86-LABEL: test3:
	; X86: # %bb.0:			; X86 doesn't have __divti3, so the urem is expanded into a loop.
	; X86-NEXT: pushl %ebp			; X86: udiv-do-while
	; X86-NEXT: movl %esp, %ebp
	; X86-NEXT: pushl %edi
	; X86-NEXT: pushl %esi
	; X86-NEXT: andl $-8, %esp
	; X86-NEXT: subl $16, %esp
	; X86-NEXT: movl 8(%ebp), %esi
	; X86-NEXT: movl %esp, %eax
	; X86-NEXT: pushl $-1
	; X86-NEXT: pushl $-5
	; X86-NEXT: pushl $-1
	; X86-NEXT: pushl $-3
	; X86-NEXT: pushl 24(%ebp)
	; X86-NEXT: pushl 20(%ebp)
	; X86-NEXT: pushl 16(%ebp)
	; X86-NEXT: pushl 12(%ebp)
	; X86-NEXT: pushl %eax
	; X86-NEXT: calll __divti3
	; X86-NEXT: addl $32, %esp
	; X86-NEXT: movl (%esp), %eax
	; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
	; X86-NEXT: movl %edi, 12(%esi)
	; X86-NEXT: movl %edx, 8(%esi)
	; X86-NEXT: movl %ecx, 4(%esi)
	; X86-NEXT: movl %eax, (%esi)
	; X86-NEXT: movl %esi, %eax
	; X86-NEXT: leal -8(%ebp), %esp
	; X86-NEXT: popl %esi
	; X86-NEXT: popl %edi
	; X86-NEXT: popl %ebp
	; X86-NEXT: retl $4
	;			;
	; X64-LABEL: test3:			; X64-LABEL: test3:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: pushq %rax			; X64-NEXT: pushq %rax
	; X64-NEXT: movq $-3, %rdx			; X64-NEXT: movq $-3, %rdx
	; X64-NEXT: movq $-5, %rcx			; X64-NEXT: movq $-5, %rcx
	; X64-NEXT: callq __divti3@PLT			; X64-NEXT: callq __divti3@PLT
	; X64-NEXT: popq %rcx			; X64-NEXT: popq %rcx
	; X64-NEXT: retq			; X64-NEXT: retq
	%tmp = sdiv i128 %x, -73786976294838206467			%tmp = sdiv i128 %x, -73786976294838206467
	ret i128 %tmp			ret i128 %tmp
	}			}

llvm/test/CodeGen/X86/i128-udiv.ll

	Show All 25 Lines
	; X64-NEXT: xorl %edx, %edx			; X64-NEXT: xorl %edx, %edx
	; X64-NEXT: retq			; X64-NEXT: retq
	%tmp = udiv i128 %x, 73786976294838206464			%tmp = udiv i128 %x, 73786976294838206464
	ret i128 %tmp			ret i128 %tmp
	}			}

	define i128 @test2(i128 %x) nounwind {			define i128 @test2(i128 %x) nounwind {
	; X86-LABEL: test2:			; X86-LABEL: test2:
	; X86: # %bb.0:			; X86 doesn't have __divti3, so the urem is expanded into a loop.
	; X86-NEXT: pushl %ebp			; X86: udiv-do-while
	; X86-NEXT: movl %esp, %ebp
	; X86-NEXT: pushl %edi
	; X86-NEXT: pushl %esi
	; X86-NEXT: andl $-8, %esp
	; X86-NEXT: subl $16, %esp
	; X86-NEXT: movl 8(%ebp), %esi
	; X86-NEXT: movl %esp, %eax
	; X86-NEXT: pushl $-1
	; X86-NEXT: pushl $-4
	; X86-NEXT: pushl $0
	; X86-NEXT: pushl $0
	; X86-NEXT: pushl 24(%ebp)
	; X86-NEXT: pushl 20(%ebp)
	; X86-NEXT: pushl 16(%ebp)
	; X86-NEXT: pushl 12(%ebp)
	; X86-NEXT: pushl %eax
	; X86-NEXT: calll __udivti3
	; X86-NEXT: addl $32, %esp
	; X86-NEXT: movl (%esp), %eax
	; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
	; X86-NEXT: movl %edi, 12(%esi)
	; X86-NEXT: movl %edx, 8(%esi)
	; X86-NEXT: movl %ecx, 4(%esi)
	; X86-NEXT: movl %eax, (%esi)
	; X86-NEXT: movl %esi, %eax
	; X86-NEXT: leal -8(%ebp), %esp
	; X86-NEXT: popl %esi
	; X86-NEXT: popl %edi
	; X86-NEXT: popl %ebp
	; X86-NEXT: retl $4
	;			;
	; X64-LABEL: test2:			; X64-LABEL: test2:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: pushq %rax			; X64-NEXT: pushq %rax
	; X64-NEXT: xorl %edx, %edx			; X64-NEXT: xorl %edx, %edx
	; X64-NEXT: movq $-4, %rcx			; X64-NEXT: movq $-4, %rcx
	; X64-NEXT: callq __udivti3@PLT			; X64-NEXT: callq __udivti3@PLT
	; X64-NEXT: popq %rcx			; X64-NEXT: popq %rcx
	; X64-NEXT: retq			; X64-NEXT: retq
	%tmp = udiv i128 %x, -73786976294838206464			%tmp = udiv i128 %x, -73786976294838206464
	ret i128 %tmp			ret i128 %tmp
	}			}

	define i128 @test3(i128 %x) nounwind {			define i128 @test3(i128 %x) nounwind {
	; X86-LABEL: test3:			; X86-LABEL: test3:
	; X86: # %bb.0:			; X86 doesn't have __divti3, so the urem is expanded into a loop.
	; X86-NEXT: pushl %ebp			; X86: udiv-do-while
	; X86-NEXT: movl %esp, %ebp
	; X86-NEXT: pushl %edi
	; X86-NEXT: pushl %esi
	; X86-NEXT: andl $-8, %esp
	; X86-NEXT: subl $16, %esp
	; X86-NEXT: movl 8(%ebp), %esi
	; X86-NEXT: movl %esp, %eax
	; X86-NEXT: pushl $-1
	; X86-NEXT: pushl $-5
	; X86-NEXT: pushl $-1
	; X86-NEXT: pushl $-3
	; X86-NEXT: pushl 24(%ebp)
	; X86-NEXT: pushl 20(%ebp)
	; X86-NEXT: pushl 16(%ebp)
	; X86-NEXT: pushl 12(%ebp)
	; X86-NEXT: pushl %eax
	; X86-NEXT: calll __udivti3
	; X86-NEXT: addl $32, %esp
	; X86-NEXT: movl (%esp), %eax
	; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86-NEXT: movl {{[0-9]+}}(%esp), %edx
	; X86-NEXT: movl {{[0-9]+}}(%esp), %edi
	; X86-NEXT: movl %edi, 12(%esi)
	; X86-NEXT: movl %edx, 8(%esi)
	; X86-NEXT: movl %ecx, 4(%esi)
	; X86-NEXT: movl %eax, (%esi)
	; X86-NEXT: movl %esi, %eax
	; X86-NEXT: leal -8(%ebp), %esp
	; X86-NEXT: popl %esi
	; X86-NEXT: popl %edi
	; X86-NEXT: popl %ebp
	; X86-NEXT: retl $4
	;			;
	; X64-LABEL: test3:			; X64-LABEL: test3:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: pushq %rax			; X64-NEXT: pushq %rax
	; X64-NEXT: movq $-3, %rdx			; X64-NEXT: movq $-3, %rdx
	; X64-NEXT: movq $-5, %rcx			; X64-NEXT: movq $-5, %rcx
	; X64-NEXT: callq __udivti3@PLT			; X64-NEXT: callq __udivti3@PLT
	; X64-NEXT: popq %rcx			; X64-NEXT: popq %rcx
	; X64-NEXT: retq			; X64-NEXT: retq
	%tmp = udiv i128 %x, -73786976294838206467			%tmp = udiv i128 %x, -73786976294838206467
	ret i128 %tmp			ret i128 %tmp
	}			}

llvm/test/CodeGen/X86/libcall-sret.ll

This file was deleted.

	; RUN: llc -mtriple=i686-linux-gnu -o - %s \| FileCheck %s

	@var = global i128 0

	; We were trying to convert the i128 operation into a libcall, but failing to
	; perform sret demotion when we couldn't return the result in registers. Make
	; sure we marshal the return properly:

	define void @test_sret_libcall(i128 %l, i128 %r) {
	; CHECK-LABEL: test_sret_libcall:

	; Stack for call: 4(sret ptr), 16(i128 %l), 16(128 %r). So next logical
	; (aligned) place for the actual sret data is %esp + 20.
	; CHECK: leal 20(%esp), [[SRET_ADDR:%[a-z]+]]
	; CHECK: pushl 72(%esp)
	; CHECK: pushl 72(%esp)
	; CHECK: pushl 72(%esp)
	; CHECK: pushl 72(%esp)
	; CHECK: pushl 72(%esp)
	; CHECK: pushl 72(%esp)
	; CHECK: pushl 72(%esp)
	; CHECK: pushl 72(%esp)
	; CHECK: pushl [[SRET_ADDR]]

	; CHECK: calll __udivti3

	; CHECK: addl $44, %esp
	; CHECK-DAG: movl 8(%esp), [[RES0:%[a-z]+]]
	; CHECK-DAG: movl 12(%esp), [[RES1:%[a-z]+]]
	; CHECK-DAG: movl 16(%esp), [[RES2:%[a-z]+]]
	; CHECK-DAG: movl 20(%esp), [[RES3:%[a-z]+]]
	; CHECK-DAG: movl [[RES0]], var
	; CHECK-DAG: movl [[RES1]], var+4
	; CHECK-DAG: movl [[RES2]], var+8
	; CHECK-DAG: movl [[RES3]], var+12
	%quot = udiv i128 %l, %r
	store i128 %quot, ptr @var
	ret void
	}

llvm/test/CodeGen/X86/opt-pipeline.ll

	Show All 20 Lines
	; CHECK-NEXT: Assumption Cache Tracker			; CHECK-NEXT: Assumption Cache Tracker
	; CHECK-NEXT: Profile summary info			; CHECK-NEXT: Profile summary info
	; CHECK-NEXT: Create Garbage Collector Module Metadata			; CHECK-NEXT: Create Garbage Collector Module Metadata
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: Default Regalloc Eviction Advisor			; CHECK-NEXT: Default Regalloc Eviction Advisor
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
				; CHECK-NEXT: Expand large div/rem
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: Lower AMX intrinsics			; CHECK-NEXT: Lower AMX intrinsics
	; CHECK-NEXT: Lower AMX type for load/store			; CHECK-NEXT: Lower AMX type for load/store
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/pr38539.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown -verify-machineinstrs \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown -verify-machineinstrs \| FileCheck %s --check-prefix=X64
	; RUN: llc < %s -mtriple=i686-unknown -verify-machineinstrs \| FileCheck %s --check-prefix=X86			; RUN: llc < %s -mtriple=i686-unknown -verify-machineinstrs \| FileCheck %s --check-prefix=X86

	; This test is targeted at 64-bit mode. It used to crash due to the creation of an EXTRACT_SUBREG after the peephole pass had ran.			; This test is targeted at 64-bit mode. It used to crash due to the creation of an EXTRACT_SUBREG after the peephole pass had ran.
	define void @f() {			define void @f() {
	; X64-LABEL: f:			; X64-LABEL: f:
	; X64: # %bb.0: # %BB			; X64: # %bb.0: # %BB
	; X64-NEXT: movzbl (%rax), %eax			; X64-NEXT: movzbl (%rax), %eax
	; X64-NEXT: cmpb $0, (%rax)			; X64-NEXT: cmpb $0, (%rax)
	; X64-NEXT: setne (%rax)			; X64-NEXT: setne (%rax)
	; X64-NEXT: leaq -{{[0-9]+}}(%rsp), %rax			; X64-NEXT: leaq -{{[0-9]+}}(%rsp), %rax
	; X64-NEXT: movq %rax, (%rax)			; X64-NEXT: movq %rax, (%rax)
	; X64-NEXT: movb $0, (%rax)			; X64-NEXT: movb $0, (%rax)
	; X64-NEXT: retq			; X64-NEXT: retq
	;
	; X86-LABEL: f:
	; X86: # %bb.0: # %BB
	; X86-NEXT: pushl %ebp
	; X86-NEXT: .cfi_def_cfa_offset 8
	; X86-NEXT: .cfi_offset %ebp, -8
	; X86-NEXT: movl %esp, %ebp
	; X86-NEXT: .cfi_def_cfa_register %ebp
	; X86-NEXT: andl $-8, %esp
	; X86-NEXT: subl $16, %esp
	; X86-NEXT: movzbl (%eax), %eax
	; X86-NEXT: cmpb $0, (%eax)
	; X86-NEXT: setne (%eax)
	; X86-NEXT: leal -{{[0-9]+}}(%esp), %eax
	; X86-NEXT: movl %eax, (%eax)
	; X86-NEXT: movb $0, (%eax)
	; X86-NEXT: movl %ebp, %esp
	; X86-NEXT: popl %ebp
	; X86-NEXT: .cfi_def_cfa %esp, 4
	; X86-NEXT: retl
	BB:			BB:
	%A30 = alloca i66			%A30 = alloca i66
	%L17 = load i66, ptr %A30			%L17 = load i66, ptr %A30
	%B20 = and i66 %L17, -1			%B20 = and i66 %L17, -1
	%G2 = getelementptr i66, ptr %A30, i1 true			%G2 = getelementptr i66, ptr %A30, i1 true
	%L10 = load volatile i8, ptr undef			%L10 = load volatile i8, ptr undef
	%L11 = load volatile i8, ptr undef			%L11 = load volatile i8, ptr undef
	%B6 = udiv i8 %L10, %L11			%B6 = udiv i8 %L10, %L11
	▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/udivmodei5.ll

This file was added.

				; RUN: llc < %s -mtriple=i686-unknown-unknown \| FileCheck %s --check-prefix=X86
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefix=X64

				; On i686, this is expanded into a loop. On x86_64, this calls __udivti3.
				define i65 @udiv65(i65 %a, i65 %b) nounwind {
				; X86-LABEL: udiv65:
				; X86-NOT: call
				;
				; X64-LABEL: udiv65:
				; X64: # %bb.0:
				; X64-NEXT: pushq %rax
				; X64-NEXT: andl $1, %esi
				; X64-NEXT: andl $1, %ecx
				; X64-NEXT: callq __udivti3@PLT
				; X64-NEXT: popq %rcx
				; X64-NEXT: retq
				%res = udiv i65 %a, %b
				ret i65 %res
				}

				define i129 @udiv129(i129 %a, i129 %b) nounwind {
				; X86-LABEL: udiv129:
				; X86-NOT: call
				;
				; X64-LABEL: udiv129:
				; X64-NOT: call
				%res = udiv i129 %a, %b
				ret i129 %res
				}

				define i129 @urem129(i129 %a, i129 %b) nounwind {
				; X86-LABEL: urem129:
				; X86-NOT: call
				;
				; X64-LABEL: urem129:
				; X64-NOT: call
				%res = urem i129 %a, %b
				ret i129 %res
				}

				define i129 @sdiv129(i129 %a, i129 %b) nounwind {
				; X86-LABEL: sdiv129:
				; X86-NOT: call
				;
				; X64-LABEL: sdiv129:
				; X64-NOT: call
				%res = sdiv i129 %a, %b
				ret i129 %res
				}

				define i129 @srem129(i129 %a, i129 %b) nounwind {
				; X86-LABEL: srem129:
				; X86-NOT: call
				;
				; X64-LABEL: srem129:
				; X64-NOT: call
				%res = srem i129 %a, %b
				ret i129 %res
				}

				; Some higher sizes
				define i257 @sdiv257(i257 %a, i257 %b) nounwind {
				; X86-LABEL: sdiv257:
				; X86-NOT: call
				;
				; X64-LABEL: sdiv257:
				; X64-NOT: call
				%res = sdiv i257 %a, %b
				ret i257 %res
				}

This is an archive of the discontinued LLVM Phabricator instance.

[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 458169

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/CodeGen/ExpandLargeDivRem.cpp

llvm/lib/CodeGen/TargetPassConfig.cpp

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

llvm/lib/Target/X86/X86TargetTransformInfo.h

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/test/CodeGen/AArch64/O0-pipeline.ll

llvm/test/CodeGen/AArch64/O3-pipeline.ll

llvm/test/CodeGen/AArch64/udivmodei5.ll

llvm/test/CodeGen/ARM/O3-pipeline.ll

llvm/test/CodeGen/ARM/udivmodei5.ll

llvm/test/CodeGen/X86/O0-pipeline.ll

llvm/test/CodeGen/X86/div-rem-pair-recomposition-signed.ll

llvm/test/CodeGen/X86/div-rem-pair-recomposition-unsigned.ll

llvm/test/CodeGen/X86/i128-sdiv.ll

llvm/test/CodeGen/X86/i128-udiv.ll

llvm/test/CodeGen/X86/libcall-sret.ll

llvm/test/CodeGen/X86/opt-pipeline.ll

llvm/test/CodeGen/X86/pr38539.ll

llvm/test/CodeGen/X86/udivmodei5.ll

[llvm/CodeGen] Enable the ExpandLargeDivRem pass for X86, Arm and AArch64
ClosedPublic