This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
6/9
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
24/26
DAGCombiner.cpp
-
Target/X86/
-
X86/
-
X86ISelLowering.h
1/1
X86ISelLowering.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
fold-int-pow2-with-fmul-or-fdiv.ll

Differential D154805

[DAGCombiner] Fold IEEE `fmul`/`fdiv` by Pow2 to `add`/`sub` of exp
ClosedPublic

Authored by goldstein.w.n on Jul 9 2023, 4:01 PM.

Download Raw Diff

Details

Reviewers

RKSimon
pengfei
nikic
arsenm
efriedma

Commits

rG47c642f9a0e9: [DAGCombiner] Fold IEEE `fmul`/`fdiv` by Pow2 to `add`/`sub` of exp

Summary

Note: This is moving D154678 which previously implemented this in
InstCombine. Concerns where brought up that this was de-canonicalizing
and really targeting a codegen improvement, so placing in DAGCombiner.

This implements:

(fmul C, (uitofp Pow2))
    -> (bitcast_to_FP (add (bitcast_to_INT C), Log2(Pow2) << mantissa))
(fdiv C, (uitofp Pow2))
    -> (bitcast_to_FP (sub (bitcast_to_INT C), Log2(Pow2) << mantissa))

The motivation is mostly fdiv where 2^(-p) is a fairly common
expression.

The patch is intentionally conservative about the transform, only
doing so if we:

have IEEE floats
C is normal
add/sub of max(Log2(Pow2)) stays in the min/max exponent bounds.

Alive2 can't realistically prove this, but did test float16/float32
cases (within the bounds of the above rules) exhaustively.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,070 ms	x64 debian > MLIR.Examples/standalone::test.toy
	60,100 ms	x64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp

Event Timeline

goldstein.w.n created this revision.Jul 9 2023, 4:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2023, 4:01 PM

Herald added subscribers: StephenFan, steven.zhang, hiraditya. · View Herald Transcript

goldstein.w.n requested review of this revision.Jul 9 2023, 4:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2023, 4:01 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B244008: Diff 538463.Jul 9 2023, 4:01 PM

goldstein.w.n added a child revision: D154804: [X86] Add tests for folding `fmul`/`fdiv` by Pow2 to `add`/`sub` of exp; NFC.Jul 9 2023, 4:02 PM

goldstein.w.n mentioned this in D154678: [InstCombine] Fold IEEE `fmul`/`fdiv` by Pow2 to `add`/`sub` of exp.

RKSimon added inline comments.Jul 10 2023, 3:18 AM

llvm/include/llvm/CodeGen/TargetLowering.h
3975	Describe/tag the purpose of each argument or use the same names in the patterns to make it more obvious.
3981	(style) Remove the (void) lines
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16152	Can this happen? Make it an assert if you must.
16154	Remove MaxDepth and use SelectionDAG::MaxRecursionDepth
16158	If we allow bitcast can't we end with cases where we've flipped between vectors/scalars or changed vectorelementcount?
16178	You can use ISD::matchUnaryPredicate to support non-uniform vectors here - see isDivisorPowerOfTwo for a similar case.
16180	Standard practice in DAG is to increment Depth at a recursive functional call, not inside it.
16201	Use Depth + 1 here and adjust the depth tolerance above
16209	Use Depth + 1 here and adjust the depth tolerance above
16211	Use Depth + 1 here and adjust the depth tolerance above
16220	Use Depth + 1 here and adjust the depth tolerance above
16222	Use Depth + 1 here and adjust the depth tolerance above
16256	Don't you need to ensure that the scalar size in bits of stays the same through the bitcast? AFAICT you should only accept bitcasts that casts from fp to/from int (both scalar/vector and same scalar width)?

goldstein.w.n removed a child revision: D154804: [X86] Add tests for folding `fmul`/`fdiv` by Pow2 to `add`/`sub` of exp; NFC.Jul 10 2023, 9:27 AM

goldstein.w.n added a parent revision: D154804: [X86] Add tests for folding `fmul`/`fdiv` by Pow2 to `add`/`sub` of exp; NFC.

goldstein.w.n marked 13 inline comments as done.Jul 10 2023, 10:40 AM

goldstein.w.n added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16158	Dropped bitcast peeking here. It doesn't really add any value b.c we can't handle any of the cases that bitcast would be needed (Int <-> Vec <-> FP).
16256	When we takelog2 it will cast the new SDValue to proper integer type. But dropping peekthroughbitcasts here as it doesn't really add any value.

Variety of fixes

goldstein.w.n added a child revision: D154868: [DAGCombiner] Extend `combineFMulOrFDivWithIntPow2` to work for non-splat float vecs.Jul 10 2023, 10:41 AM

Harbormaster completed remote builds in B244212: Diff 538733.Jul 10 2023, 1:03 PM

Can you add a test where the original fmul would fold into an fma? I think you're worse off doing this if you're interfering with FMA formation

llvm/include/llvm/CodeGen/TargetLowering.h
3971	I think this should just use the fmul case, and turn the compatible fdivs to fmul
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16188	You don't need to bother special casing the splat case. getConstant on a vector will just split this out to a build_vector of constants anyway
16201	I'd kind of expect there to be a helper that does this already?
16216	Move isOneConstant as the last condition
16488	Can you add a test where this w

goldstein.w.n added inline comments.Jul 11 2023, 2:17 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16488	Where this w?

arsenm added inline comments.Jul 11 2023, 2:18 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16488	I moved the comment to the main box, the fma formation one

goldstein.w.n marked 4 inline comments as done.Jul 11 2023, 4:19 PM

goldstein.w.n added inline comments.

llvm/include/llvm/CodeGen/TargetLowering.h
3971	I'm not sure I understand. We will need an `fdiv` no matter what, at the very least for the reciprocal of the pow2.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16201	There is `DAG.getZExtOrTrunc` but it doesn't check bitcast or if oldvt == newvt.
16488	Okay, I added an fma test `fmul_pow_shl_cnt_vec_preserve_fma` and it does so (the call to this transform is explicitly after the fma fold. I added a comment as well to be clearer about that).

Cleanup + make FMA stuff more clear

goldstein.w.n mentioned this in D146121: [DAG] Move lshr narrowing from visitANDLike to SimplifyDemandedBits.Jul 11 2023, 5:14 PM

Harbormaster completed remote builds in B244634: Diff 539326.Jul 11 2023, 9:41 PM

ping.

Rebase

arsenm added inline comments.Jul 17 2023, 4:31 PM

llvm/include/llvm/CodeGen/TargetLowering.h
3971	I mean you can rewrite fdiv by power of 2 as fmul by 2 to the negative power

goldstein.w.n added inline comments.Jul 17 2023, 4:46 PM

llvm/include/llvm/CodeGen/TargetLowering.h
3971	but `2 ^ -N` requires a division. I.e `1 / (2 ^ N)` so I don't see the benefit.

Harbormaster completed remote builds in B245958: Diff 541167.Jul 17 2023, 5:15 PM

Rebase

Harbormaster completed remote builds in B247330: Diff 543078.Jul 21 2023, 8:39 PM

goldstein.w.n mentioned this in D156029: [InstCombine] icmp udiv transform.Jul 27 2023, 7:42 AM

ping

Have you looked at using the existing DAG::isKnownToBeAPowerOfTwo and DAGCombiner::BuildLogBase2 methods?

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16258	Why not make this a DAGCombiner method directly?
llvm/lib/Target/X86/X86ISelLowering.cpp
25042	Add a TODO - as I'm sure we can be more aggressive here, vXi64 uittofp in particular on pre-AVX512 targets, but smaller integers are often worth it since we'd have to extend them any way for the int-to-fp conversion.

In D154805#4544765, @RKSimon wrote:

Have you looked at using the existing DAG::isKnownToBeAPowerOfTwo and DAGCombiner::BuildLogBase2 methods?

Edit: after looking a little more, think this is a better approach. Currently working on a patch to improve both those functions, then will rebase this on top.
Just did. That would work in a sense. The rationale of keeping seperate is that takeLog2 as is, is basically guranteed to return an expression thats as or less expensive as the pow2 op. I.e
if we have (min a, b), we might return (min log2_a, log2_b) but wouldn't return that if we didn't already find a min.

Think that highlights this might be more re-appropriately named findInexpensiveLog2 (changed).

Looking at our current combine ability we also seem to not optimize trivial log2 cases of BuildLogBase2 i.e:

declare i32 @llvm.ctlz.i32(i32, i1)

define i32 @trivial_log2(i32 %x) {
  %s = shl i32 1, %x
  %r = call i32 @llvm.ctlz.i32(i32 %s, i1 true)
  ret i32 %r
}

Doesn't optimize out the shl/ctlz.

I can follow up with patches to integrate takeLog2 into buildLog2Base

In D154805#4544765, @RKSimon wrote:

Have you looked at using the existing DAG::isKnownToBeAPowerOfTwo and DAGCombiner::BuildLogBase2 methods?

Done, includes a bit of a refactor for both those methods as they where lacking.

goldstein.w.n marked an inline comment as done.Aug 1 2023, 12:39 AM

Use buildlogbase2, put in DAGComber::, add todo

Harbormaster completed remote builds in B249412: Diff 545944.Aug 1 2023, 12:40 AM

Rebase

Harbormaster completed remote builds in B249814: Diff 546516.Aug 2 2023, 1:41 PM

Rebase

Harbormaster completed remote builds in B250121: Diff 546932.Aug 3 2023, 10:29 AM

Rebase

Harbormaster completed remote builds in B250357: Diff 547254.Aug 4 2023, 11:46 AM

ping.

RKSimon added inline comments.Aug 15 2023, 3:09 AM

llvm/include/llvm/CodeGen/TargetLowering.h
3984	This should probably be inside TargetLoweringBase near the top of the file - somewhere near shouldProduceAndByConstByHoistingConstFromShiftsLHSOfAnd - that's where we put the per-target optimization controls switches.

goldstein.w.n marked an inline comment as done.Aug 15 2023, 4:32 PM

Move switch

arsenm added inline comments.Aug 15 2023, 4:44 PM

llvm/include/llvm/CodeGen/TargetLowering.h
822–823	I think the fdiv by power of 2 should be generally converted to fmul by inverse power of 2 in the first place
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16160	ANY_EXTEND seems dangerous here

Harbormaster completed remote builds in B252798: Diff 550541.Aug 15 2023, 9:59 PM

goldstein.w.n marked an inline comment as done.Aug 16 2023, 12:02 AM

goldstein.w.n added inline comments.

llvm/include/llvm/CodeGen/TargetLowering.h
822–823	Maybe although you still need recipricol which is still division. But would rather make that tradeoff a seperate patch either way.

Remove any_extend

Harbormaster completed remote builds in B252872: Diff 550642.Aug 16 2023, 3:53 AM

@arsenm Do you have any more feedback?

In D154805#4610773, @RKSimon wrote:

@arsenm Do you have any more feedback?

I think you're worse off doing this if ldexp is legal (not that we have the combine to form ldexp here)

In D154805#4611715, @arsenm wrote:

In D154805#4610773, @RKSimon wrote:

@arsenm Do you have any more feedback?

I think you're worse off doing this if ldexp is legal (not that we have the combine to form ldexp here)

why is that? ldexp gets lowered as mul/div and its not really possible to transform ldexp into this.
EDIT: Am happy to add TODO to look into again should ldexp support be fleshed out.

In D154805#4611744, @goldstein.w.n wrote:

why is that? ldexp gets lowered as mul/div and its not really possible to transform ldexp into this.

It doesn't lower to mul/div (and it may need a range check, at least for div). Most targets use the libcall

It's approximately

fmul x, pow2_k => ldexp(x, log2(pow2_k)
fmul x, pow2_k => ldexp(x, -log2(pow2_k)

with a possible additional precondition for the exponent range

In D154805#4612046, @arsenm wrote:

In D154805#4611744, @goldstein.w.n wrote:

why is that? ldexp gets lowered as mul/div and its not really possible to transform ldexp into this.

It doesn't lower to mul/div (and it may need a range check, at least for div). Most targets use the libcall

Err was thinking implemented with, but thats wrong too.
Looks like ldexp is implemented with the same trick as here (at least glibc/llvm-libc).
We are essentially inlining the fast path.
So don't really see how this could cause a slowdown given it gets to skip all the
checks + is inlined.
I can see the argument against having it in instcombine, but as its implemented here,
its at the end of the line so the speak and there isn't much left for it to potentially
de-optimize.

It's approximately
fmul x, pow2_k => ldexp(x, log2(pow2_k)
fmul x, pow2_k => ldexp(x, -log2(pow2_k)
with a possible additional precondition for the exponent range

I tried implemented this through ldexp but ALOT of powers of two are known from shifts
and its quite useful to do the analysis when its still in shift state as we can easily limit
the exponent. (The very fact that its (UINT_MAX * float) lets us bound stuff, OTOH (float << UINT_MAX) is unweildy).

If there where flags attached to ldexp that bounded the exponent I agree it would be
best to get to ldexp in InstCombine then use ldexp for the lowering.

Rebase + add todo for ldexp

Herald added subscribers: kerbowa, jvesely. · View Herald TranscriptSep 2 2023, 1:15 PM

Harbormaster completed remote builds in B256424: Diff 555597.Sep 2 2023, 1:16 PM

@arsenm I added some AMDGPU tests + some tests of ldexp so its easier to compare the results of this patch.
I would also be happy to add patch so this code can also generate ldexp for the fmul case if the target prefers it if that helps address your concerns.

ping.

ping

LGTM - please can you raise an issue describing the missing (fmul x, pow2_k) - > (ldexp x, (log2 pow2_k)) fold

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
27116	(style) It'd be better to move the takeInexpensiveLog2 static implementation directly above BuildLogBase2 if that is its only user.

This revision is now accepted and ready to land.Sep 19 2023, 10:00 AM

In D154805#4648261, @RKSimon wrote:

LGTM - please can you raise an issue describing the missing (fmul x, pow2_k) - > (ldexp x, (log2 pow2_k)) fold

Done: see https://github.com/llvm/llvm-project/issues/66794

Move takeInexpensiveLog2 to closer to its use-case

N/A

Harbormaster completed remote builds in B257421: Diff 557064.Sep 19 2023, 1:21 PM

Closed by commit rG47c642f9a0e9: [DAGCombiner] Fold IEEE `fmul`/`fdiv` by Pow2 to `add`/`sub` of exp (authored by goldstein.w.n). · Explain WhySep 20 2023, 11:28 AM

This revision was automatically updated to reflect the committed changes.

goldstein.w.n added a commit: rG47c642f9a0e9: [DAGCombiner] Fold IEEE `fmul`/`fdiv` by Pow2 to `add`/`sub` of exp.

dmgreen added a subscriber: dmgreen.Sep 22 2023, 8:40 AM

dmgreen added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
27170–27171	Hello. It looks like these should be the other way around? Otherwise CurVT might be == NewVT, but ToCast has now looked through a zext. Testcase: https://godbolt.org/z/xd7TbGa6r

goldstein.w.n added inline comments.Sep 22 2023, 8:52 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
27170–27171	Yup, fix will be posted shortly!

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

18 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

214 lines

Target/

X86/

X86ISelLowering.h

3 lines

X86ISelLowering.cpp

16 lines

test/

CodeGen/

X86/

fold-int-pow2-with-fmul-or-fdiv.ll

538 lines

Diff 538733

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 813 Lines • ▼ Show 20 Lines	virtual bool preferIncOfAddToSubOfNot(EVT VT) const {
// By default, let's assume that everyone prefers the form with two add's.		// By default, let's assume that everyone prefers the form with two add's.
return true;		return true;
}		}

// By default prefer folding (abs (sub nsw x, y)) -> abds(x, y). Some targets		// By default prefer folding (abs (sub nsw x, y)) -> abds(x, y). Some targets
// may want to avoid this to prevent loss of sub_nsw pattern.		// may want to avoid this to prevent loss of sub_nsw pattern.
virtual bool preferABDSToABSWithNSW(EVT VT) const {		virtual bool preferABDSToABSWithNSW(EVT VT) const {
return true;		return true;
}		}

		arsenmUnsubmitted Not Done Reply Inline Actions I think the fdiv by power of 2 should be generally converted to fmul by inverse power of 2 in the first place arsenm: I think the fdiv by power of 2 should be generally converted to fmul by inverse power of 2 in…
		goldstein.w.nAuthorUnsubmitted Done Reply Inline Actions Maybe although you still need recipricol which is still division. But would rather make that tradeoff a seperate patch either way. goldstein.w.n: Maybe although you still need recipricol which is still division. But would rather make that…
// Return true if the target wants to transform Op(Splat(X)) -> Splat(Op(X))		// Return true if the target wants to transform Op(Splat(X)) -> Splat(Op(X))
virtual bool preferScalarizeSplat(SDNode *N) const { return true; }		virtual bool preferScalarizeSplat(SDNode *N) const { return true; }

/// Return true if the target wants to use the optimization that		/// Return true if the target wants to use the optimization that
/// turns ext(promotableInst1(...(promotableInstN(load)))) into		/// turns ext(promotableInst1(...(promotableInstN(load)))) into
/// promotedInst1(...(promotedInstN(ext(load)))).		/// promotedInst1(...(promotedInstN(ext(load)))).
bool enableExtLdPromotion() const { return EnableExtLdPromotion; }		bool enableExtLdPromotion() const { return EnableExtLdPromotion; }

▲ Show 20 Lines • Show All 3,128 Lines • ▼ Show 20 Lines	public:

/// Return true if this function can prove that \p Op is never poison		/// Return true if this function can prove that \p Op is never poison
/// and, if \p PoisonOnly is false, does not have undef bits. The DemandedElts		/// and, if \p PoisonOnly is false, does not have undef bits. The DemandedElts
/// argument limits the check to the requested vector elements.		/// argument limits the check to the requested vector elements.
virtual bool isGuaranteedNotToBeUndefOrPoisonForTargetNode(		virtual bool isGuaranteedNotToBeUndefOrPoisonForTargetNode(
SDValue Op, const APInt &DemandedElts, const SelectionDAG &DAG,		SDValue Op, const APInt &DemandedElts, const SelectionDAG &DAG,
bool PoisonOnly, unsigned Depth) const;		bool PoisonOnly, unsigned Depth) const;

		// Return true if its desirable to perform the following transform:
		// (fmul C, (uitofp Pow2))
		// -> (bitcast_to_FP (add (bitcast_to_INT C), Log2(Pow2) << mantissa))
		// (fdiv C, (uitofp Pow2))
		arsenmUnsubmitted Not Done Reply Inline Actions I think this should just use the fmul case, and turn the compatible fdivs to fmul arsenm: I think this should just use the fmul case, and turn the compatible fdivs to fmul
		goldstein.w.nAuthorUnsubmitted Done Reply Inline Actions I'm not sure I understand. We will need an `fdiv` no matter what, at the very least for the reciprocal of the pow2. goldstein.w.n: I'm not sure I understand. We will need an `fdiv` no matter what, at the very least for the…
		arsenmUnsubmitted Not Done Reply Inline Actions I mean you can rewrite fdiv by power of 2 as fmul by 2 to the negative power arsenm: I mean you can rewrite fdiv by power of 2 as fmul by 2 to the negative power
		goldstein.w.nAuthorUnsubmitted Done Reply Inline Actions but `2 ^ -N` requires a division. I.e `1 / (2 ^ N)` so I don't see the benefit. goldstein.w.n: but `2 ^ -N` requires a division. I.e `1 / (2 ^ N)` so I don't see the benefit.
		// -> (bitcast_to_FP (sub (bitcast_to_INT C), Log2(Pow2) << mantissa))
		//
		// This is only queried after we have verified the transform will be bitwise
		// equals.
		RKSimonUnsubmitted Done Reply Inline Actions Describe/tag the purpose of each argument or use the same names in the patterns to make it more obvious. RKSimon: Describe/tag the purpose of each argument or use the same names in the patterns to make it more…
		//
		// SDNode *N : The FDiv/FMul node we want to transform.
		// SDValue FPConst: The Float constant operand in `N`.
		// SDValue IntPow2: The Integer power of 2 operand in `N`.
		virtual bool optimizeFMulOrFDivAsShiftAddBitcast(SDNode *N, SDValue FPConst,
		SDValue IntPow2) const {
		RKSimonUnsubmitted Done Reply Inline Actions (style) Remove the (void) lines RKSimon: (style) Remove the (void) lines
		// Default to avoiding fdiv which is often very expensive.
		return N->getOpcode() == ISD::FDIV;
		}
		RKSimonUnsubmitted Done Reply Inline Actions This should probably be inside TargetLoweringBase near the top of the file - somewhere near shouldProduceAndByConstByHoistingConstFromShiftsLHSOfAnd - that's where we put the per-target optimization controls switches. RKSimon: This should probably be inside TargetLoweringBase near the top of the file - somewhere near…

/// Return true if Op can create undef or poison from non-undef & non-poison		/// Return true if Op can create undef or poison from non-undef & non-poison
/// operands. The DemandedElts argument limits the check to the requested		/// operands. The DemandedElts argument limits the check to the requested
/// vector elements.		/// vector elements.
virtual bool		virtual bool
canCreateUndefOrPoisonForTargetNode(SDValue Op, const APInt &DemandedElts,		canCreateUndefOrPoisonForTargetNode(SDValue Op, const APInt &DemandedElts,
const SelectionDAG &DAG, bool PoisonOnly,		const SelectionDAG &DAG, bool PoisonOnly,
bool ConsiderFlags, unsigned Depth) const;		bool ConsiderFlags, unsigned Depth) const;

▲ Show 20 Lines • Show All 1,370 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 16,131 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFSUB(SDNode *N) {
if (SDValue Fused = visitFSUBForFMACombine<EmptyMatchContext>(N)) {		if (SDValue Fused = visitFSUBForFMACombine<EmptyMatchContext>(N)) {
AddToWorklist(Fused.getNode());		AddToWorklist(Fused.getNode());
return Fused;		return Fused;
}		}

return SDValue();		return SDValue();
}		}

		// This is basically just a port of takeLog2 from InstCombineMulDivRem.cpp
		//
		// Returns the node that represents `Log2(Op)`. This may create a new node. If
		// we are unable to compute `Log2(Op)` its return `SDValue()`.
		//
		// All nodes will be created at `DL` and the output will be of type `VT`.
		//
		// This will only return `Log2(Op)` if we can prove `Op` is non-zero. Set
		// `AssumeNonZero` if this function should simply assume (not require proving
		// `Op` is non-zero).
		static SDValue takeLog2(SelectionDAG &DAG, SDLoc DL, EVT VT, SDValue Op,
		unsigned Depth, bool AssumeNonZero) {
		assert(VT.isInteger() && "Only integer types are supported!");
		RKSimonUnsubmitted Done Reply Inline Actions Can this happen? Make it an assert if you must. RKSimon: Can this happen? Make it an assert if you must.

		auto PeekThroughCastsAndTrunc = [](SDValue V) {
		RKSimonUnsubmitted Done Reply Inline Actions Remove MaxDepth and use SelectionDAG::MaxRecursionDepth RKSimon: Remove MaxDepth and use SelectionDAG::MaxRecursionDepth
		while (true) {
		switch (V.getOpcode()) {
		case ISD::TRUNCATE:
		case ISD::ZERO_EXTEND:
		RKSimonUnsubmitted Done Reply Inline Actions If we allow bitcast can't we end with cases where we've flipped between vectors/scalars or changed vectorelementcount? RKSimon: If we allow bitcast can't we end with cases where we've flipped between vectors/scalars or…
		goldstein.w.nAuthorUnsubmitted Done Reply Inline Actions Dropped bitcast peeking here. It doesn't really add any value b.c we can't handle any of the cases that bitcast would be needed (Int <-> Vec <-> FP). goldstein.w.n: Dropped bitcast peeking here. It doesn't really add any value b.c we can't handle any of the…
		case ISD::ANY_EXTEND:
		V = V.getOperand(0);
		arsenmUnsubmitted Done Reply Inline Actions ANY_EXTEND seems dangerous here arsenm: ANY_EXTEND seems dangerous here
		break;
		default:
		return V;
		}
		}
		};

		Op = PeekThroughCastsAndTrunc(Op);

		// Helper for determining whether a value is a power-2 constant scalar or a
		// vector of such elements.
		SmallVector<APInt> Pow2Constants;
		bool AllSame = false;
		auto IsPowerOfTwo = [&AllSame, &Pow2Constants](ConstantSDNode *C) {
		if (C->isZero() \|\| C->isOpaque())
		return false;
		// TODO: We may also be able to support negative powers of 2 here.
		if (C->getAPIntValue().isPowerOf2()) {
		RKSimonUnsubmitted Done Reply Inline Actions You can use ISD::matchUnaryPredicate to support non-uniform vectors here - see isDivisorPowerOfTwo for a similar case. RKSimon: You can use ISD::matchUnaryPredicate to support non-uniform vectors here - see…
		if (!Pow2Constants.empty())
		AllSame &= C->getAPIntValue().eq(Pow2Constants.back());
		RKSimonUnsubmitted Done Reply Inline Actions Standard practice in DAG is to increment Depth at a recursive functional call, not inside it. RKSimon: Standard practice in DAG is to increment Depth at a recursive functional call, not inside it.
		Pow2Constants.emplace_back(C->getAPIntValue());
		return true;
		}
		return false;
		};

		if (ISD::matchUnaryPredicate(Op, IsPowerOfTwo)) {
		if (AllSame \|\| !VT.isVector())
		arsenmUnsubmitted Done Reply Inline Actions You don't need to bother special casing the splat case. getConstant on a vector will just split this out to a build_vector of constants anyway arsenm: You don't need to bother special casing the splat case. getConstant on a vector will just split…
		return DAG.getConstant(Pow2Constants.back().logBase2(), DL, VT);
		// We need to create a build vector
		SmallVector<SDValue> Log2Ops;
		for (const APInt &Pow2 : Pow2Constants)
		Log2Ops.emplace_back(
		DAG.getConstant(Pow2.logBase2(), DL, VT.getScalarType()));
		return DAG.getBuildVector(VT, DL, Log2Ops);
		}

		if (Depth >= DAG.MaxRecursionDepth)
		return SDValue();

		auto CastToVT = [&](EVT NewVT, SDValue ToCast) {
		RKSimonUnsubmitted Done Reply Inline Actions Use Depth + 1 here and adjust the depth tolerance above RKSimon: Use Depth + 1 here and adjust the depth tolerance above
		arsenmUnsubmitted Not Done Reply Inline Actions I'd kind of expect there to be a helper that does this already? arsenm: I'd kind of expect there to be a helper that does this already?
		goldstein.w.nAuthorUnsubmitted Done Reply Inline Actions There is `DAG.getZExtOrTrunc` but it doesn't check bitcast or if oldvt == newvt. goldstein.w.n: There is `DAG.getZExtOrTrunc` but it doesn't check bitcast or if oldvt == newvt.
		EVT CurVT = ToCast.getValueType();
		ToCast = PeekThroughCastsAndTrunc(ToCast);
		if (NewVT == CurVT)
		return ToCast;

		if (NewVT.getSizeInBits() == CurVT.getSizeInBits())
		return DAG.getBitcast(NewVT, ToCast);

		RKSimonUnsubmitted Done Reply Inline Actions Use Depth + 1 here and adjust the depth tolerance above RKSimon: Use Depth + 1 here and adjust the depth tolerance above
		return DAG.getZExtOrTrunc(ToCast, DL, NewVT);
		};
		RKSimonUnsubmitted Done Reply Inline Actions Use Depth + 1 here and adjust the depth tolerance above RKSimon: Use Depth + 1 here and adjust the depth tolerance above

		// log2(X << Y) -> log2(X) + Y
		if (Op.getOpcode() == ISD::SHL) {
		// 1 << Y and X nuw/nsw << Y are all non-zero.
		if (AssumeNonZero \|\| isOneConstant(Op.getOperand(0)) \|\|
		arsenmUnsubmitted Done Reply Inline Actions Move isOneConstant as the last condition arsenm: Move isOneConstant as the last condition
		Op->getFlags().hasNoUnsignedWrap() \|\| Op->getFlags().hasNoSignedWrap())
		if (SDValue LogX =
		takeLog2(DAG, DL, VT, Op.getOperand(0), Depth + 1, AssumeNonZero))
		return DAG.getNode(ISD::ADD, DL, VT, LogX,
		RKSimonUnsubmitted Done Reply Inline Actions Use Depth + 1 here and adjust the depth tolerance above RKSimon: Use Depth + 1 here and adjust the depth tolerance above
		CastToVT(VT, Op.getOperand(1)));
		}
		RKSimonUnsubmitted Done Reply Inline Actions Use Depth + 1 here and adjust the depth tolerance above RKSimon: Use Depth + 1 here and adjust the depth tolerance above

		// c ? X : Y -> c ? Log2(X) : Log2(Y)
		if (Op.getOpcode() == ISD::SELECT) {
		if (SDValue LogX =
		takeLog2(DAG, DL, VT, Op.getOperand(1), Depth + 1, AssumeNonZero))
		if (SDValue LogY =
		takeLog2(DAG, DL, VT, Op.getOperand(2), Depth + 1, AssumeNonZero))
		return DAG.getSelect(DL, VT, Op.getOperand(0), LogX, LogY);
		}

		// log2(umin(X, Y)) -> umin(log2(X), log2(Y))
		// log2(umax(X, Y)) -> umax(log2(X), log2(Y))
		if (Op.getOpcode() == ISD::UMIN \|\| Op.getOpcode() == ISD::UMAX) {
		// Use AssumeNonZero as false here. Otherwise we can hit case where
		// log2(umax(X, Y)) != umax(log2(X), log2(Y)) (because overflow).
		if (SDValue LogX = takeLog2(DAG, DL, VT, Op.getOperand(0), Depth + 1,
		/AssumeNonZero/ false))
		if (SDValue LogY = takeLog2(DAG, DL, VT, Op.getOperand(1), Depth + 1,
		/AssumeNonZero/ false))
		return DAG.getNode(Op.getOpcode(), DL, VT, LogX, LogY);
		}

		return SDValue();
		}

		// Transform IEEE Floats:
		// (fmul C, (uitofp Pow2))
		// -> (bitcast_to_FP (add (bitcast_to_INT C), Log2(Pow2) << mantissa))
		// (fdiv C, (uitofp Pow2))
		// -> (bitcast_to_FP (sub (bitcast_to_INT C), Log2(Pow2) << mantissa))
		//
		// The rationale is fmul/fdiv by a power of 2 is just change the exponent, so
		// there is no need for more than an add/sub.
		//
		RKSimonUnsubmitted Done Reply Inline Actions Don't you need to ensure that the scalar size in bits of stays the same through the bitcast? AFAICT you should only accept bitcasts that casts from fp to/from int (both scalar/vector and same scalar width)? RKSimon: Don't you need to ensure that the scalar size in bits of stays the same through the bitcast?
		goldstein.w.nAuthorUnsubmitted Done Reply Inline Actions When we takelog2 it will cast the new SDValue to proper integer type. But dropping peekthroughbitcasts here as it doesn't really add any value. goldstein.w.n: When we takelog2 it will cast the new SDValue to proper integer type. But dropping…
		// This is valid under the following circumstances:
		// 1) We are dealing with IEEE floats
		RKSimonUnsubmitted Done Reply Inline Actions Why not make this a DAGCombiner method directly? RKSimon: Why not make this a DAGCombiner method directly?
		// 2) C is normal
		// 3) The fmul/fdiv add/sub will not go outside of min/max exponent bounds.
		static SDValue combineFMulOrFDivWithIntPow2(DAGCombiner *DC,
		const TargetLowering &TLI,
		SDNode *N) {
		SelectionDAG &DAG = DC->getDAG();
		EVT VT = N->getValueType(0);
		SDValue ConstOp, Pow2Op;

		int Mantissa = -1;
		auto GetConstAndPow2Ops = [&](unsigned ConstOpIdx) {
		if (ConstOpIdx == 1 && N->getOpcode() == ISD::FDIV)
		return false;

		ConstOp = peekThroughBitcasts(N->getOperand(ConstOpIdx));
		Pow2Op = N->getOperand(1 - ConstOpIdx);
		if (Pow2Op.getOpcode() != ISD::UINT_TO_FP &&
		(Pow2Op.getOpcode() != ISD::SINT_TO_FP \|\|
		!DAG.computeKnownBits(Pow2Op).isNonNegative()))
		return false;

		Pow2Op = Pow2Op.getOperand(0);

		// TODO(1): We may be able to include undefs.
		// TODO(2): We could also handle non-splat vector types.
		ConstantFPSDNode *CFP =
		isConstOrConstSplatFP(ConstOp, /AllowUndefs/ false);
		if (CFP == nullptr)
		return false;
		const APFloat &APF = CFP->getValueAPF();

		// Make sure we have normal/ieee constant.
		if (!APF.isNormal() \|\| !APF.isIEEE())
		return false;

		// `Log2(Pow2Op) < Pow2Op.getScalarSizeInBits()`.
		// TODO: We could use knownbits to make this bound more precise.
		int MaxExpChange = Pow2Op.getValueType().getScalarSizeInBits();

		// Make sure the floats exponent is within the bounds that this transform
		// produces bitwise equals value.
		int CurExp = ilogb(APF);
		// FMul by pow2 will only increase exponent.
		int MinExp = N->getOpcode() == ISD::FMUL ? CurExp : (CurExp - MaxExpChange);
		// FDiv by pow2 will only decrease exponent.
		int MaxExp = N->getOpcode() == ISD::FDIV ? CurExp : (CurExp + MaxExpChange);
		if (MinExp <= APFloat::semanticsMinExponent(APF.getSemantics()) \|\|
		MaxExp >= APFloat::semanticsMaxExponent(APF.getSemantics()))
		return false;

		// Finally make sure we actually know the mantissa for the float type.
		Mantissa = APFloat::semanticsPrecision(APF.getSemantics()) - 1;
		return Mantissa > 0;
		};

		if (!GetConstAndPow2Ops(0) && !GetConstAndPow2Ops(1))
		return SDValue();

		if (!TLI.optimizeFMulOrFDivAsShiftAddBitcast(N, ConstOp, Pow2Op))
		return SDValue();

		// Get log2 after all other checks have taken place. This is because takeLog2
		// may create a new node.
		SDLoc DL(N);
		// Get Log2 type with same bitwidth as the float type (VT).
		EVT NewIntVT = EVT::getIntegerVT(*DAG.getContext(), VT.getScalarSizeInBits());
		if (VT.isVector())
		NewIntVT = EVT::getVectorVT(*DAG.getContext(), NewIntVT,
		VT.getVectorNumElements());

		SDValue Log2 = takeLog2(DAG, DL, NewIntVT, Pow2Op, /Depth/ 0,
		DAG.isKnownNeverZero(Pow2Op));
		if (!Log2)
		return SDValue();

		// Perform actual transform.
		SDValue MantissaShiftCnt =
		DAG.getConstant(Mantissa, DL, DC->getShiftAmountTy(NewIntVT));
		// TODO: Sometimes Log2 is of form `(X + C)`. `(X + C) << C1` should fold to
		// `(X << C1) + (C << C1)`, but that isn't always the case because of the
		// cast. We could implement that by handle here to handle the casts.
		SDValue Shift = DAG.getNode(ISD::SHL, DL, NewIntVT, Log2, MantissaShiftCnt);
		SDValue ResAsInt =
		DAG.getNode(N->getOpcode() == ISD::FMUL ? ISD::ADD : ISD::SUB, DL,
		NewIntVT, DAG.getBitcast(NewIntVT, ConstOp), Shift);
		SDValue ResAsFP = DAG.getBitcast(VT, ResAsInt);
		return ResAsFP;
		}

SDValue DAGCombiner::visitFMUL(SDNode *N) {		SDValue DAGCombiner::visitFMUL(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
ConstantFPSDNode *N1CFP = isConstOrConstSplatFP(N1, true);		ConstantFPSDNode *N1CFP = isConstOrConstSplatFP(N1, true);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc DL(N);		SDLoc DL(N);
const TargetOptions &Options = DAG.getTarget().Options;		const TargetOptions &Options = DAG.getTarget().Options;
const SDNodeFlags Flags = N->getFlags();		const SDNodeFlags Flags = N->getFlags();
▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFMUL(SDNode *N) {
}		}

// FMUL -> FMA combines:		// FMUL -> FMA combines:
if (SDValue Fused = visitFMULForFMADistributiveCombine(N)) {		if (SDValue Fused = visitFMULForFMADistributiveCombine(N)) {
AddToWorklist(Fused.getNode());		AddToWorklist(Fused.getNode());
return Fused;		return Fused;
}		}

		if (SDValue R = combineFMulOrFDivWithIntPow2(this, this->TLI, N))
		arsenmUnsubmitted Done Reply Inline Actions Can you add a test where this w arsenm: Can you add a test where this w
		goldstein.w.nAuthorUnsubmitted Done Reply Inline Actions Where this w? goldstein.w.n: Where this w?
		arsenmUnsubmitted Done Reply Inline Actions I moved the comment to the main box, the fma formation one arsenm: I moved the comment to the main box, the fma formation one
		goldstein.w.nAuthorUnsubmitted Done Reply Inline Actions Okay, I added an fma test `fmul_pow_shl_cnt_vec_preserve_fma` and it does so (the call to this transform is explicitly after the fma fold. I added a comment as well to be clearer about that). goldstein.w.n: Okay, I added an fma test `fmul_pow_shl_cnt_vec_preserve_fma` and it does so (the call to this…
		return R;

return SDValue();		return SDValue();
}		}

template <class MatchContextClass> SDValue DAGCombiner::visitFMA(SDNode *N) {		template <class MatchContextClass> SDValue DAGCombiner::visitFMA(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
SDValue N2 = N->getOperand(2);		SDValue N2 = N->getOperand(2);
ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);		ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);
▲ Show 20 Lines • Show All 335 Lines • ▼ Show 20 Lines	if (NegN0) {
HandleSDNode NegN0Handle(NegN0);		HandleSDNode NegN0Handle(NegN0);
SDValue NegN1 =		SDValue NegN1 =
TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize, CostN1);		TLI.getNegatedExpression(N1, DAG, LegalOperations, ForCodeSize, CostN1);
if (NegN1 && (CostN0 == TargetLowering::NegatibleCost::Cheaper \|\|		if (NegN1 && (CostN0 == TargetLowering::NegatibleCost::Cheaper \|\|
CostN1 == TargetLowering::NegatibleCost::Cheaper))		CostN1 == TargetLowering::NegatibleCost::Cheaper))
return DAG.getNode(ISD::FDIV, SDLoc(N), VT, NegN0, NegN1);		return DAG.getNode(ISD::FDIV, SDLoc(N), VT, NegN0, NegN1);
}		}

		if (SDValue R = combineFMulOrFDivWithIntPow2(this, this->TLI, N))
		return R;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitFREM(SDNode *N) {		SDValue DAGCombiner::visitFREM(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDNodeFlags Flags = N->getFlags();		SDNodeFlags Flags = N->getFlags();
▲ Show 20 Lines • Show All 10,255 Lines • ▼ Show 20 Lines
/// Determines the LogBase2 value for a non-null input value using the		/// Determines the LogBase2 value for a non-null input value using the
/// transform: LogBase2(V) = (EltBits - 1) - ctlz(V).		/// transform: LogBase2(V) = (EltBits - 1) - ctlz(V).
SDValue DAGCombiner::BuildLogBase2(SDValue V, const SDLoc &DL) {		SDValue DAGCombiner::BuildLogBase2(SDValue V, const SDLoc &DL) {
EVT VT = V.getValueType();		EVT VT = V.getValueType();
SDValue Ctlz = DAG.getNode(ISD::CTLZ, DL, VT, V);		SDValue Ctlz = DAG.getNode(ISD::CTLZ, DL, VT, V);
SDValue Base = DAG.getConstant(VT.getScalarSizeInBits() - 1, DL, VT);		SDValue Base = DAG.getConstant(VT.getScalarSizeInBits() - 1, DL, VT);
SDValue LogBase2 = DAG.getNode(ISD::SUB, DL, VT, Base, Ctlz);		SDValue LogBase2 = DAG.getNode(ISD::SUB, DL, VT, Base, Ctlz);
return LogBase2;		return LogBase2;
}		}
		RKSimonUnsubmitted Done Reply Inline Actions (style) It'd be better to move the takeInexpensiveLog2 static implementation directly above BuildLogBase2 if that is its only user. RKSimon: (style) It'd be better to move the takeInexpensiveLog2 static implementation directly above…

/// Newton iteration for a function: F(X) is X_{i+1} = X_i - F(X_i)/F'(X_i)		/// Newton iteration for a function: F(X) is X_{i+1} = X_i - F(X_i)/F'(X_i)
/// For the reciprocal, we need to find the zero of the function:		/// For the reciprocal, we need to find the zero of the function:
/// F(X) = 1/X - A [which has a zero at X = 1/A]		/// F(X) = 1/X - A [which has a zero at X = 1/A]
/// =>		/// =>
/// X_{i+1} = X_i (2 - A X_i) = X_i + X_i (1 - A X_i) [this second form		/// X_{i+1} = X_i (2 - A X_i) = X_i + X_i (1 - A X_i) [this second form
/// does not require additional intermediate precision]		/// does not require additional intermediate precision]
/// For the last iteration, put numerator N into it to gain more precision:		/// For the last iteration, put numerator N into it to gain more precision:
Show All 37 Lines	if (Iterations) {

SDValue NewEst = DAG.getNode(ISD::FMUL, DL, VT, Op, MulEst, Flags);		SDValue NewEst = DAG.getNode(ISD::FMUL, DL, VT, Op, MulEst, Flags);
AddToWorklist(NewEst.getNode());		AddToWorklist(NewEst.getNode());

NewEst = DAG.getNode(ISD::FSUB, DL, VT,		NewEst = DAG.getNode(ISD::FSUB, DL, VT,
(i == Iterations - 1 ? N : FPOne), NewEst, Flags);		(i == Iterations - 1 ? N : FPOne), NewEst, Flags);
AddToWorklist(NewEst.getNode());		AddToWorklist(NewEst.getNode());

NewEst = DAG.getNode(ISD::FMUL, DL, VT, Est, NewEst, Flags);		NewEst = DAG.getNode(ISD::FMUL, DL, VT, Est, NewEst, Flags);
AddToWorklist(NewEst.getNode());		AddToWorklist(NewEst.getNode());
		dmgreenUnsubmitted Not Done Reply Inline Actions Hello. It looks like these should be the other way around? Otherwise CurVT might be == NewVT, but ToCast has now looked through a zext. Testcase: https://godbolt.org/z/xd7TbGa6r dmgreen: Hello. It looks like these should be the other way around? Otherwise CurVT might be == NewVT…
		goldstein.w.nAuthorUnsubmitted Done Reply Inline Actions Yup, fix will be posted shortly! goldstein.w.n: Yup, fix will be posted shortly!

Est = DAG.getNode(ISD::FADD, DL, VT, MulEst, NewEst, Flags);		Est = DAG.getNode(ISD::FADD, DL, VT, MulEst, NewEst, Flags);
AddToWorklist(Est.getNode());		AddToWorklist(Est.getNode());
}		}
} else {		} else {
// If no iterations are available, multiply with N.		// If no iterations are available, multiply with N.
Est = DAG.getNode(ISD::FMUL, DL, VT, Est, N, Flags);		Est = DAG.getNode(ISD::FMUL, DL, VT, Est, N, Flags);
AddToWorklist(Est.getNode());		AddToWorklist(Est.getNode());
▲ Show 20 Lines • Show All 580 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.h

Show First 20 Lines • Show All 1,790 Lines • ▼ Show 20 Lines	MachineBasicBlock *EmitSjLjDispatchBlock(MachineInstr &MI,
MachineBasicBlock *MBB) const;		MachineBasicBlock *MBB) const;

/// Emit flags for the given setcc condition and operands. Also returns the		/// Emit flags for the given setcc condition and operands. Also returns the
/// corresponding X86 condition code constant in X86CC.		/// corresponding X86 condition code constant in X86CC.
SDValue emitFlagsForSetcc(SDValue Op0, SDValue Op1, ISD::CondCode CC,		SDValue emitFlagsForSetcc(SDValue Op0, SDValue Op1, ISD::CondCode CC,
const SDLoc &dl, SelectionDAG &DAG,		const SDLoc &dl, SelectionDAG &DAG,
SDValue &X86CC) const;		SDValue &X86CC) const;

		bool optimizeFMulOrFDivAsShiftAddBitcast(SDNode *N, SDValue FPConst,
		SDValue IntPow2) const override;

/// Check if replacement of SQRT with RSQRT should be disabled.		/// Check if replacement of SQRT with RSQRT should be disabled.
bool isFsqrtCheap(SDValue Op, SelectionDAG &DAG) const override;		bool isFsqrtCheap(SDValue Op, SelectionDAG &DAG) const override;

/// Use rsqrt* to speed up sqrt calculations.		/// Use rsqrt* to speed up sqrt calculations.
SDValue getSqrtEstimate(SDValue Op, SelectionDAG &DAG, int Enabled,		SDValue getSqrtEstimate(SDValue Op, SelectionDAG &DAG, int Enabled,
int &RefinementSteps, bool &UseOneConstNR,		int &RefinementSteps, bool &UseOneConstNR,
bool Reciprocal) const override;		bool Reciprocal) const override;

▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 25,022 Lines • ▼ Show 20 Lines	static SDValue EmitCmp(SDValue Op0, SDValue Op1, unsigned X86CC,
}		}

// Use SUB instead of CMP to enable CSE between SUB and CMP.		// Use SUB instead of CMP to enable CSE between SUB and CMP.
SDVTList VTs = DAG.getVTList(CmpVT, MVT::i32);		SDVTList VTs = DAG.getVTList(CmpVT, MVT::i32);
SDValue Sub = DAG.getNode(X86ISD::SUB, dl, VTs, Op0, Op1);		SDValue Sub = DAG.getNode(X86ISD::SUB, dl, VTs, Op0, Op1);
return Sub.getValue(1);		return Sub.getValue(1);
}		}

		bool X86TargetLowering::optimizeFMulOrFDivAsShiftAddBitcast(
		SDNode *N, SDValue, SDValue IntPow2) const {
		if (N->getOpcode() == ISD::FDIV)
		return true;

		EVT FPVT = N->getValueType(0);
		EVT IntVT = IntPow2.getValueType();

		// This indicates a non-free bitcast.
		if (FPVT.isVector() &&
		FPVT.getScalarSizeInBits() != IntVT.getScalarSizeInBits())
		return false;
		RKSimonUnsubmitted Done Reply Inline Actions Add a TODO - as I'm sure we can be more aggressive here, vXi64 uittofp in particular on pre-AVX512 targets, but smaller integers are often worth it since we'd have to extend them any way for the int-to-fp conversion. RKSimon: Add a TODO - as I'm sure we can be more aggressive here, vXi64 uittofp in particular on pre…

		return true;
		}

/// Check if replacement of SQRT with RSQRT should be disabled.		/// Check if replacement of SQRT with RSQRT should be disabled.
bool X86TargetLowering::isFsqrtCheap(SDValue Op, SelectionDAG &DAG) const {		bool X86TargetLowering::isFsqrtCheap(SDValue Op, SelectionDAG &DAG) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();

// We don't need to replace SQRT with RSQRT for half type.		// We don't need to replace SQRT with RSQRT for half type.
if (VT.getScalarType() == MVT::f16)		if (VT.getScalarType() == MVT::f16)
return true;		return true;

▲ Show 20 Lines • Show All 32,759 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fold-int-pow2-with-fmul-or-fdiv.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 3
; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefixes=CHECK-SSE		; RUN: llc < %s -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefixes=CHECK-SSE
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK-AVX,CHECK-AVX2		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefixes=CHECK-AVX,CHECK-AVX2
; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK-AVX,CHECK-AVX512F		; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx512f \| FileCheck %s --check-prefixes=CHECK-AVX,CHECK-AVX512F

declare i16 @llvm.umax.i16(i16, i16)		declare i16 @llvm.umax.i16(i16, i16)
declare i64 @llvm.umin.i64(i64, i64)		declare i64 @llvm.umin.i64(i64, i64)

define double @fmul_pow_shl_cnt(i64 %cnt) {		define double @fmul_pow_shl_cnt(i64 %cnt) {
; CHECK-SSE-LABEL: fmul_pow_shl_cnt:		; CHECK-SSE-LABEL: fmul_pow_shl_cnt:
; CHECK-SSE: # %bb.0:		; CHECK-SSE: # %bb.0:
; CHECK-SSE-NEXT: movq %rdi, %rcx		; CHECK-SSE-NEXT: shlq $52, %rdi
; CHECK-SSE-NEXT: movl $1, %eax		; CHECK-SSE-NEXT: movabsq $4621256167635550208, %rax # imm = 0x4022000000000000
; CHECK-SSE-NEXT: # kill: def $cl killed $cl killed $rcx		; CHECK-SSE-NEXT: addq %rdi, %rax
; CHECK-SSE-NEXT: shlq %cl, %rax		; CHECK-SSE-NEXT: movq %rax, %xmm0
; CHECK-SSE-NEXT: movq %rax, %xmm1
; CHECK-SSE-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],mem[0],xmm1[1],mem[1]
; CHECK-SSE-NEXT: subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
; CHECK-SSE-NEXT: movapd %xmm1, %xmm0
; CHECK-SSE-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1],xmm1[1]
; CHECK-SSE-NEXT: addsd %xmm1, %xmm0
; CHECK-SSE-NEXT: mulsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; CHECK-SSE-NEXT: retq		; CHECK-SSE-NEXT: retq
;		;
; CHECK-AVX2-LABEL: fmul_pow_shl_cnt:		; CHECK-AVX-LABEL: fmul_pow_shl_cnt:
; CHECK-AVX2: # %bb.0:		; CHECK-AVX: # %bb.0:
; CHECK-AVX2-NEXT: movq %rdi, %rcx		; CHECK-AVX-NEXT: shlq $52, %rdi
; CHECK-AVX2-NEXT: movl $1, %eax		; CHECK-AVX-NEXT: movabsq $4621256167635550208, %rax # imm = 0x4022000000000000
; CHECK-AVX2-NEXT: # kill: def $cl killed $cl killed $rcx		; CHECK-AVX-NEXT: addq %rdi, %rax
; CHECK-AVX2-NEXT: shlq %cl, %rax		; CHECK-AVX-NEXT: vmovq %rax, %xmm0
; CHECK-AVX2-NEXT: vmovq %rax, %xmm0		; CHECK-AVX-NEXT: retq
; CHECK-AVX2-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
; CHECK-AVX2-NEXT: vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX2-NEXT: vshufpd {{.*#+}} xmm1 = xmm0[1,0]
; CHECK-AVX2-NEXT: vaddsd %xmm0, %xmm1, %xmm0
; CHECK-AVX2-NEXT: vmulsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX2-NEXT: retq
;
; CHECK-AVX512F-LABEL: fmul_pow_shl_cnt:
; CHECK-AVX512F: # %bb.0:
; CHECK-AVX512F-NEXT: movq %rdi, %rcx
; CHECK-AVX512F-NEXT: movl $1, %eax
; CHECK-AVX512F-NEXT: # kill: def $cl killed $cl killed $rcx
; CHECK-AVX512F-NEXT: shlq %cl, %rax
; CHECK-AVX512F-NEXT: vcvtusi2sd %rax, %xmm0, %xmm0
; CHECK-AVX512F-NEXT: vmulsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX512F-NEXT: retq
%shl = shl nuw i64 1, %cnt		%shl = shl nuw i64 1, %cnt
%conv = uitofp i64 %shl to double		%conv = uitofp i64 %shl to double
%mul = fmul double 9.000000e+00, %conv		%mul = fmul double 9.000000e+00, %conv
ret double %mul		ret double %mul
}		}

define double @fmul_pow_shl_cnt2(i64 %cnt) {		define double @fmul_pow_shl_cnt2(i64 %cnt) {
; CHECK-SSE-LABEL: fmul_pow_shl_cnt2:		; CHECK-SSE-LABEL: fmul_pow_shl_cnt2:
; CHECK-SSE: # %bb.0:		; CHECK-SSE: # %bb.0:
; CHECK-SSE-NEXT: movq %rdi, %rcx		; CHECK-SSE-NEXT: incl %edi
; CHECK-SSE-NEXT: movl $2, %eax		; CHECK-SSE-NEXT: shlq $52, %rdi
; CHECK-SSE-NEXT: # kill: def $cl killed $cl killed $rcx		; CHECK-SSE-NEXT: movabsq $-4602115869219225600, %rax # imm = 0xC022000000000000
; CHECK-SSE-NEXT: shlq %cl, %rax		; CHECK-SSE-NEXT: addq %rdi, %rax
; CHECK-SSE-NEXT: movq %rax, %xmm1		; CHECK-SSE-NEXT: movq %rax, %xmm0
; CHECK-SSE-NEXT: punpckldq {{.*#+}} xmm1 = xmm1[0],mem[0],xmm1[1],mem[1]
; CHECK-SSE-NEXT: subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
; CHECK-SSE-NEXT: movapd %xmm1, %xmm0
; CHECK-SSE-NEXT: unpckhpd {{.*#+}} xmm0 = xmm0[1],xmm1[1]
; CHECK-SSE-NEXT: addsd %xmm1, %xmm0
; CHECK-SSE-NEXT: mulsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; CHECK-SSE-NEXT: retq		; CHECK-SSE-NEXT: retq
;		;
; CHECK-AVX2-LABEL: fmul_pow_shl_cnt2:		; CHECK-AVX-LABEL: fmul_pow_shl_cnt2:
; CHECK-AVX2: # %bb.0:		; CHECK-AVX: # %bb.0:
; CHECK-AVX2-NEXT: movq %rdi, %rcx		; CHECK-AVX-NEXT: incl %edi
; CHECK-AVX2-NEXT: movl $2, %eax		; CHECK-AVX-NEXT: shlq $52, %rdi
; CHECK-AVX2-NEXT: # kill: def $cl killed $cl killed $rcx		; CHECK-AVX-NEXT: movabsq $-4602115869219225600, %rax # imm = 0xC022000000000000
; CHECK-AVX2-NEXT: shlq %cl, %rax		; CHECK-AVX-NEXT: addq %rdi, %rax
; CHECK-AVX2-NEXT: vmovq %rax, %xmm0		; CHECK-AVX-NEXT: vmovq %rax, %xmm0
; CHECK-AVX2-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]		; CHECK-AVX-NEXT: retq
; CHECK-AVX2-NEXT: vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX2-NEXT: vshufpd {{.*#+}} xmm1 = xmm0[1,0]
; CHECK-AVX2-NEXT: vaddsd %xmm0, %xmm1, %xmm0
; CHECK-AVX2-NEXT: vmulsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX2-NEXT: retq
;
; CHECK-AVX512F-LABEL: fmul_pow_shl_cnt2:
; CHECK-AVX512F: # %bb.0:
; CHECK-AVX512F-NEXT: movq %rdi, %rcx
; CHECK-AVX512F-NEXT: movl $2, %eax
; CHECK-AVX512F-NEXT: # kill: def $cl killed $cl killed $rcx
; CHECK-AVX512F-NEXT: shlq %cl, %rax
; CHECK-AVX512F-NEXT: vcvtusi2sd %rax, %xmm0, %xmm0
; CHECK-AVX512F-NEXT: vmulsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX512F-NEXT: retq
%shl = shl nuw i64 2, %cnt		%shl = shl nuw i64 2, %cnt
%conv = uitofp i64 %shl to double		%conv = uitofp i64 %shl to double
%mul = fmul double -9.000000e+00, %conv		%mul = fmul double -9.000000e+00, %conv
ret double %mul		ret double %mul
}		}

define float @fmul_pow_select(i32 %cnt, i1 %c) {		define float @fmul_pow_select(i32 %cnt, i1 %c) {
; CHECK-SSE-LABEL: fmul_pow_select:		; CHECK-SSE-LABEL: fmul_pow_select:
; CHECK-SSE: # %bb.0:		; CHECK-SSE: # %bb.0:
; CHECK-SSE-NEXT: movl %edi, %ecx		; CHECK-SSE-NEXT: # kill: def $edi killed $edi def $rdi
; CHECK-SSE-NEXT: andl $1, %esi		; CHECK-SSE-NEXT: leal 1(%rdi), %eax
; CHECK-SSE-NEXT: movl $2, %eax		; CHECK-SSE-NEXT: testb $1, %sil
; CHECK-SSE-NEXT: subl %esi, %eax		; CHECK-SSE-NEXT: cmovnel %edi, %eax
; CHECK-SSE-NEXT: # kill: def $cl killed $cl killed $ecx		; CHECK-SSE-NEXT: shll $23, %eax
; CHECK-SSE-NEXT: shll %cl, %eax		; CHECK-SSE-NEXT: addl $1091567616, %eax # imm = 0x41100000
; CHECK-SSE-NEXT: cvtsi2ss %rax, %xmm0		; CHECK-SSE-NEXT: movd %eax, %xmm0
; CHECK-SSE-NEXT: mulss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; CHECK-SSE-NEXT: retq		; CHECK-SSE-NEXT: retq
;		;
; CHECK-AVX2-LABEL: fmul_pow_select:		; CHECK-AVX-LABEL: fmul_pow_select:
; CHECK-AVX2: # %bb.0:		; CHECK-AVX: # %bb.0:
; CHECK-AVX2-NEXT: movl %edi, %ecx		; CHECK-AVX-NEXT: # kill: def $edi killed $edi def $rdi
; CHECK-AVX2-NEXT: andl $1, %esi		; CHECK-AVX-NEXT: leal 1(%rdi), %eax
; CHECK-AVX2-NEXT: movl $2, %eax		; CHECK-AVX-NEXT: testb $1, %sil
; CHECK-AVX2-NEXT: subl %esi, %eax		; CHECK-AVX-NEXT: cmovnel %edi, %eax
; CHECK-AVX2-NEXT: # kill: def $cl killed $cl killed $ecx		; CHECK-AVX-NEXT: shll $23, %eax
; CHECK-AVX2-NEXT: shll %cl, %eax		; CHECK-AVX-NEXT: addl $1091567616, %eax # imm = 0x41100000
; CHECK-AVX2-NEXT: vcvtsi2ss %rax, %xmm0, %xmm0		; CHECK-AVX-NEXT: vmovd %eax, %xmm0
; CHECK-AVX2-NEXT: vmulss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0		; CHECK-AVX-NEXT: retq
; CHECK-AVX2-NEXT: retq
;
; CHECK-AVX512F-LABEL: fmul_pow_select:
; CHECK-AVX512F: # %bb.0:
; CHECK-AVX512F-NEXT: movl %edi, %ecx
; CHECK-AVX512F-NEXT: andl $1, %esi
; CHECK-AVX512F-NEXT: movl $2, %eax
; CHECK-AVX512F-NEXT: subl %esi, %eax
; CHECK-AVX512F-NEXT: # kill: def $cl killed $cl killed $ecx
; CHECK-AVX512F-NEXT: shll %cl, %eax
; CHECK-AVX512F-NEXT: vcvtusi2ss %eax, %xmm0, %xmm0
; CHECK-AVX512F-NEXT: vmulss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX512F-NEXT: retq
%shl2 = shl nuw i32 2, %cnt		%shl2 = shl nuw i32 2, %cnt
%shl1 = shl nuw i32 1, %cnt		%shl1 = shl nuw i32 1, %cnt
%shl = select i1 %c, i32 %shl1, i32 %shl2		%shl = select i1 %c, i32 %shl1, i32 %shl2
%conv = uitofp i32 %shl to float		%conv = uitofp i32 %shl to float
%mul = fmul float 9.000000e+00, %conv		%mul = fmul float 9.000000e+00, %conv
ret float %mul		ret float %mul
}		}

define float @fmul_fly_pow_mul_min_pow2(i64 %cnt) {		define float @fmul_fly_pow_mul_min_pow2(i64 %cnt) {
; CHECK-SSE-LABEL: fmul_fly_pow_mul_min_pow2:		; CHECK-SSE-LABEL: fmul_fly_pow_mul_min_pow2:
; CHECK-SSE: # %bb.0:		; CHECK-SSE: # %bb.0:
; CHECK-SSE-NEXT: movq %rdi, %rcx		; CHECK-SSE-NEXT: addl $3, %edi
; CHECK-SSE-NEXT: movl $8, %eax		; CHECK-SSE-NEXT: cmpl $13, %edi
; CHECK-SSE-NEXT: # kill: def $cl killed $cl killed $rcx		; CHECK-SSE-NEXT: movl $13, %eax
; CHECK-SSE-NEXT: shlq %cl, %rax		; CHECK-SSE-NEXT: cmovbl %edi, %eax
; CHECK-SSE-NEXT: cmpq $8192, %rax # imm = 0x2000		; CHECK-SSE-NEXT: shll $23, %eax
; CHECK-SSE-NEXT: movl $8192, %ecx # imm = 0x2000		; CHECK-SSE-NEXT: addl $1091567616, %eax # imm = 0x41100000
; CHECK-SSE-NEXT: cmovbq %rax, %rcx		; CHECK-SSE-NEXT: movd %eax, %xmm0
; CHECK-SSE-NEXT: cvtsi2ss %rcx, %xmm0
; CHECK-SSE-NEXT: mulss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; CHECK-SSE-NEXT: retq		; CHECK-SSE-NEXT: retq
;		;
; CHECK-AVX-LABEL: fmul_fly_pow_mul_min_pow2:		; CHECK-AVX-LABEL: fmul_fly_pow_mul_min_pow2:
; CHECK-AVX: # %bb.0:		; CHECK-AVX: # %bb.0:
; CHECK-AVX-NEXT: movq %rdi, %rcx		; CHECK-AVX-NEXT: addl $3, %edi
; CHECK-AVX-NEXT: movl $8, %eax		; CHECK-AVX-NEXT: cmpl $13, %edi
; CHECK-AVX-NEXT: # kill: def $cl killed $cl killed $rcx		; CHECK-AVX-NEXT: movl $13, %eax
; CHECK-AVX-NEXT: shlq %cl, %rax		; CHECK-AVX-NEXT: cmovbl %edi, %eax
; CHECK-AVX-NEXT: cmpq $8192, %rax # imm = 0x2000		; CHECK-AVX-NEXT: shll $23, %eax
; CHECK-AVX-NEXT: movl $8192, %ecx # imm = 0x2000		; CHECK-AVX-NEXT: addl $1091567616, %eax # imm = 0x41100000
; CHECK-AVX-NEXT: cmovbq %rax, %rcx		; CHECK-AVX-NEXT: vmovd %eax, %xmm0
; CHECK-AVX-NEXT: vcvtsi2ss %rcx, %xmm0, %xmm0
; CHECK-AVX-NEXT: vmulss {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX-NEXT: retq		; CHECK-AVX-NEXT: retq
%shl8 = shl nuw i64 8, %cnt		%shl8 = shl nuw i64 8, %cnt
%shl = call i64 @llvm.umin.i64(i64 %shl8, i64 8192)		%shl = call i64 @llvm.umin.i64(i64 %shl8, i64 8192)
%conv = uitofp i64 %shl to float		%conv = uitofp i64 %shl to float
%mul = fmul float 9.000000e+00, %conv		%mul = fmul float 9.000000e+00, %conv
ret float %mul		ret float %mul
}		}

define double @fmul_pow_mul_max_pow2(i16 %cnt) {		define double @fmul_pow_mul_max_pow2(i16 %cnt) {
; CHECK-SSE-LABEL: fmul_pow_mul_max_pow2:		; CHECK-SSE-LABEL: fmul_pow_mul_max_pow2:
; CHECK-SSE: # %bb.0:		; CHECK-SSE: # %bb.0:
; CHECK-SSE-NEXT: movl %edi, %ecx		; CHECK-SSE-NEXT: movl %edi, %eax
; CHECK-SSE-NEXT: movl $2, %eax		; CHECK-SSE-NEXT: leaq 1(%rax), %rcx
; CHECK-SSE-NEXT: shll %cl, %eax		; CHECK-SSE-NEXT: cmpq %rcx, %rax
; CHECK-SSE-NEXT: movl $1, %edx		; CHECK-SSE-NEXT: cmovaq %rax, %rcx
; CHECK-SSE-NEXT: # kill: def $cl killed $cl killed $ecx		; CHECK-SSE-NEXT: shlq $52, %rcx
; CHECK-SSE-NEXT: shll %cl, %edx		; CHECK-SSE-NEXT: movabsq $4613937818241073152, %rax # imm = 0x4008000000000000
; CHECK-SSE-NEXT: cmpw %ax, %dx		; CHECK-SSE-NEXT: addq %rcx, %rax
; CHECK-SSE-NEXT: cmovbel %eax, %edx		; CHECK-SSE-NEXT: movq %rax, %xmm0
; CHECK-SSE-NEXT: movzwl %dx, %eax
; CHECK-SSE-NEXT: cvtsi2sd %eax, %xmm0
; CHECK-SSE-NEXT: mulsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; CHECK-SSE-NEXT: retq		; CHECK-SSE-NEXT: retq
;		;
; CHECK-AVX-LABEL: fmul_pow_mul_max_pow2:		; CHECK-AVX-LABEL: fmul_pow_mul_max_pow2:
; CHECK-AVX: # %bb.0:		; CHECK-AVX: # %bb.0:
; CHECK-AVX-NEXT: movl %edi, %ecx		; CHECK-AVX-NEXT: movl %edi, %eax
; CHECK-AVX-NEXT: movl $2, %eax		; CHECK-AVX-NEXT: leaq 1(%rax), %rcx
; CHECK-AVX-NEXT: shll %cl, %eax		; CHECK-AVX-NEXT: cmpq %rcx, %rax
; CHECK-AVX-NEXT: movl $1, %edx		; CHECK-AVX-NEXT: cmovaq %rax, %rcx
; CHECK-AVX-NEXT: # kill: def $cl killed $cl killed $ecx		; CHECK-AVX-NEXT: shlq $52, %rcx
; CHECK-AVX-NEXT: shll %cl, %edx		; CHECK-AVX-NEXT: movabsq $4613937818241073152, %rax # imm = 0x4008000000000000
; CHECK-AVX-NEXT: cmpw %ax, %dx		; CHECK-AVX-NEXT: addq %rcx, %rax
; CHECK-AVX-NEXT: cmovbel %eax, %edx		; CHECK-AVX-NEXT: vmovq %rax, %xmm0
; CHECK-AVX-NEXT: movzwl %dx, %eax
; CHECK-AVX-NEXT: vcvtsi2sd %eax, %xmm0, %xmm0
; CHECK-AVX-NEXT: vmulsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX-NEXT: retq		; CHECK-AVX-NEXT: retq
%shl2 = shl nuw i16 2, %cnt		%shl2 = shl nuw i16 2, %cnt
%shl1 = shl nuw i16 1, %cnt		%shl1 = shl nuw i16 1, %cnt
%shl = call i16 @llvm.umax.i16(i16 %shl1, i16 %shl2)		%shl = call i16 @llvm.umax.i16(i16 %shl1, i16 %shl2)
%conv = uitofp i16 %shl to double		%conv = uitofp i16 %shl to double
%mul = fmul double 3.000000e+00, %conv		%mul = fmul double 3.000000e+00, %conv
ret double %mul		ret double %mul
}		}
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	; CHECK-AVX512F-NEXT: retq
%conv = uitofp <2 x i64> %shl to <2 x float>		%conv = uitofp <2 x i64> %shl to <2 x float>
%mul = fmul <2 x float> <float 15.000000e+00, float 15.000000e+00>, %conv		%mul = fmul <2 x float> <float 15.000000e+00, float 15.000000e+00>, %conv
ret <2 x float> %mul		ret <2 x float> %mul
}		}

define <2 x double> @fmul_pow_shl_cnt_vec(<2 x i64> %cnt) {		define <2 x double> @fmul_pow_shl_cnt_vec(<2 x i64> %cnt) {
; CHECK-SSE-LABEL: fmul_pow_shl_cnt_vec:		; CHECK-SSE-LABEL: fmul_pow_shl_cnt_vec:
; CHECK-SSE: # %bb.0:		; CHECK-SSE: # %bb.0:
; CHECK-SSE-NEXT: movdqa {{.*#+}} xmm1 = [2,2]		; CHECK-SSE-NEXT: psllq $52, %xmm0
; CHECK-SSE-NEXT: movdqa %xmm1, %xmm2		; CHECK-SSE-NEXT: paddq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; CHECK-SSE-NEXT: psllq %xmm0, %xmm2
; CHECK-SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,2,3]
; CHECK-SSE-NEXT: psllq %xmm0, %xmm1
; CHECK-SSE-NEXT: movsd {{.*#+}} xmm1 = xmm2[0],xmm1[1]
; CHECK-SSE-NEXT: movapd {{.*#+}} xmm0 = [4294967295,4294967295]
; CHECK-SSE-NEXT: andpd %xmm1, %xmm0
; CHECK-SSE-NEXT: orpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; CHECK-SSE-NEXT: psrlq $32, %xmm1
; CHECK-SSE-NEXT: por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
; CHECK-SSE-NEXT: subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
; CHECK-SSE-NEXT: addpd %xmm0, %xmm1
; CHECK-SSE-NEXT: mulpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
; CHECK-SSE-NEXT: movapd %xmm1, %xmm0
; CHECK-SSE-NEXT: retq		; CHECK-SSE-NEXT: retq
;		;
; CHECK-AVX-LABEL: fmul_pow_shl_cnt_vec:		; CHECK-AVX-LABEL: fmul_pow_shl_cnt_vec:
; CHECK-AVX: # %bb.0:		; CHECK-AVX: # %bb.0:
; CHECK-AVX-NEXT: vpbroadcastq {{.*#+}} xmm1 = [2,2]		; CHECK-AVX-NEXT: vpsllq $52, %xmm0, %xmm0
; CHECK-AVX-NEXT: vpsllvq %xmm0, %xmm1, %xmm0		; CHECK-AVX-NEXT: vpaddq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX-NEXT: vpxor %xmm1, %xmm1, %xmm1
; CHECK-AVX-NEXT: vpblendd {{.*#+}} xmm1 = xmm0[0],xmm1[1],xmm0[2],xmm1[3]
; CHECK-AVX-NEXT: vpor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; CHECK-AVX-NEXT: vpsrlq $32, %xmm0, %xmm0
; CHECK-AVX-NEXT: vpor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX-NEXT: vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX-NEXT: vaddpd %xmm0, %xmm1, %xmm0
; CHECK-AVX-NEXT: vmulpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX-NEXT: retq		; CHECK-AVX-NEXT: retq
%shl = shl nsw nuw <2 x i64> <i64 2, i64 2>, %cnt		%shl = shl nsw nuw <2 x i64> <i64 2, i64 2>, %cnt
%conv = uitofp <2 x i64> %shl to <2 x double>		%conv = uitofp <2 x i64> %shl to <2 x double>
%mul = fmul <2 x double> <double 15.000000e+00, double 15.000000e+00>, %conv		%mul = fmul <2 x double> <double 15.000000e+00, double 15.000000e+00>, %conv
ret <2 x double> %mul		ret <2 x double> %mul
}		}

define <2 x double> @fmul_pow_shl_cnt_vec_non_splat_todo(<2 x i64> %cnt) {		define <2 x double> @fmul_pow_shl_cnt_vec_non_splat_todo(<2 x i64> %cnt) {
Show All 33 Lines	; CHECK-AVX-NEXT: retq
%conv = uitofp <2 x i64> %shl to <2 x double>		%conv = uitofp <2 x i64> %shl to <2 x double>
%mul = fmul <2 x double> <double 15.000000e+00, double 14.000000e+00>, %conv		%mul = fmul <2 x double> <double 15.000000e+00, double 14.000000e+00>, %conv
ret <2 x double> %mul		ret <2 x double> %mul
}		}

define <2 x double> @fmul_pow_shl_cnt_vec_non_splat2_todo(<2 x i64> %cnt) {		define <2 x double> @fmul_pow_shl_cnt_vec_non_splat2_todo(<2 x i64> %cnt) {
; CHECK-SSE-LABEL: fmul_pow_shl_cnt_vec_non_splat2_todo:		; CHECK-SSE-LABEL: fmul_pow_shl_cnt_vec_non_splat2_todo:
; CHECK-SSE: # %bb.0:		; CHECK-SSE: # %bb.0:
; CHECK-SSE-NEXT: movdqa {{.*#+}} xmm1 = [2,1]		; CHECK-SSE-NEXT: psllq $52, %xmm0
; CHECK-SSE-NEXT: movdqa %xmm1, %xmm2		; CHECK-SSE-NEXT: paddq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; CHECK-SSE-NEXT: psllq %xmm0, %xmm2
; CHECK-SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,2,3]
; CHECK-SSE-NEXT: psllq %xmm0, %xmm1
; CHECK-SSE-NEXT: movsd {{.*#+}} xmm1 = xmm2[0],xmm1[1]
; CHECK-SSE-NEXT: movapd {{.*#+}} xmm0 = [4294967295,4294967295]
; CHECK-SSE-NEXT: andpd %xmm1, %xmm0
; CHECK-SSE-NEXT: orpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; CHECK-SSE-NEXT: psrlq $32, %xmm1
; CHECK-SSE-NEXT: por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
; CHECK-SSE-NEXT: subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
; CHECK-SSE-NEXT: addpd %xmm0, %xmm1
; CHECK-SSE-NEXT: mulpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
; CHECK-SSE-NEXT: movapd %xmm1, %xmm0
; CHECK-SSE-NEXT: retq		; CHECK-SSE-NEXT: retq
;		;
; CHECK-AVX-LABEL: fmul_pow_shl_cnt_vec_non_splat2_todo:		; CHECK-AVX-LABEL: fmul_pow_shl_cnt_vec_non_splat2_todo:
; CHECK-AVX: # %bb.0:		; CHECK-AVX: # %bb.0:
; CHECK-AVX-NEXT: vmovdqa {{.*#+}} xmm1 = [2,1]		; CHECK-AVX-NEXT: vpsllq $52, %xmm0, %xmm0
; CHECK-AVX-NEXT: vpsllvq %xmm0, %xmm1, %xmm0		; CHECK-AVX-NEXT: vpaddq {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX-NEXT: vpxor %xmm1, %xmm1, %xmm1
; CHECK-AVX-NEXT: vpblendd {{.*#+}} xmm1 = xmm0[0],xmm1[1],xmm0[2],xmm1[3]
; CHECK-AVX-NEXT: vpor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; CHECK-AVX-NEXT: vpsrlq $32, %xmm0, %xmm0
; CHECK-AVX-NEXT: vpor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX-NEXT: vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX-NEXT: vaddpd %xmm0, %xmm1, %xmm0
; CHECK-AVX-NEXT: vmulpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX-NEXT: retq		; CHECK-AVX-NEXT: retq
%shl = shl nsw nuw <2 x i64> <i64 2, i64 1>, %cnt		%shl = shl nsw nuw <2 x i64> <i64 2, i64 1>, %cnt
%conv = uitofp <2 x i64> %shl to <2 x double>		%conv = uitofp <2 x i64> %shl to <2 x double>
%mul = fmul <2 x double> <double 15.000000e+00, double 15.000000e+00>, %conv		%mul = fmul <2 x double> <double 15.000000e+00, double 15.000000e+00>, %conv
ret <2 x double> %mul		ret <2 x double> %mul
}		}

define <2 x half> @fmul_pow_shl_cnt_vec_fail_to_large(<2 x i16> %cnt) {		define <2 x half> @fmul_pow_shl_cnt_vec_fail_to_large(<2 x i16> %cnt) {
▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	; CHECK-AVX512F-NEXT: retq
%conv = uitofp i64 %shl to double		%conv = uitofp i64 %shl to double
%mul = fmul double 9.745314e+288, %conv		%mul = fmul double 9.745314e+288, %conv
ret double %mul		ret double %mul
}		}

define double @fmul_pow_shl_cnt_safe(i16 %cnt) {		define double @fmul_pow_shl_cnt_safe(i16 %cnt) {
; CHECK-SSE-LABEL: fmul_pow_shl_cnt_safe:		; CHECK-SSE-LABEL: fmul_pow_shl_cnt_safe:
; CHECK-SSE: # %bb.0:		; CHECK-SSE: # %bb.0:
; CHECK-SSE-NEXT: movl %edi, %ecx		; CHECK-SSE-NEXT: # kill: def $edi killed $edi def $rdi
; CHECK-SSE-NEXT: movl $1, %eax		; CHECK-SSE-NEXT: shlq $52, %rdi
; CHECK-SSE-NEXT: # kill: def $cl killed $cl killed $ecx		; CHECK-SSE-NEXT: movabsq $8930638061065157010, %rax # imm = 0x7BEFFFFFFF5F3992
; CHECK-SSE-NEXT: shll %cl, %eax		; CHECK-SSE-NEXT: addq %rdi, %rax
; CHECK-SSE-NEXT: movzwl %ax, %eax		; CHECK-SSE-NEXT: movq %rax, %xmm0
; CHECK-SSE-NEXT: cvtsi2sd %eax, %xmm0
; CHECK-SSE-NEXT: mulsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; CHECK-SSE-NEXT: retq		; CHECK-SSE-NEXT: retq
;		;
; CHECK-AVX-LABEL: fmul_pow_shl_cnt_safe:		; CHECK-AVX-LABEL: fmul_pow_shl_cnt_safe:
; CHECK-AVX: # %bb.0:		; CHECK-AVX: # %bb.0:
; CHECK-AVX-NEXT: movl %edi, %ecx		; CHECK-AVX-NEXT: # kill: def $edi killed $edi def $rdi
; CHECK-AVX-NEXT: movl $1, %eax		; CHECK-AVX-NEXT: shlq $52, %rdi
; CHECK-AVX-NEXT: # kill: def $cl killed $cl killed $ecx		; CHECK-AVX-NEXT: movabsq $8930638061065157010, %rax # imm = 0x7BEFFFFFFF5F3992
; CHECK-AVX-NEXT: shll %cl, %eax		; CHECK-AVX-NEXT: addq %rdi, %rax
; CHECK-AVX-NEXT: movzwl %ax, %eax		; CHECK-AVX-NEXT: vmovq %rax, %xmm0
; CHECK-AVX-NEXT: vcvtsi2sd %eax, %xmm0, %xmm0
; CHECK-AVX-NEXT: vmulsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX-NEXT: retq		; CHECK-AVX-NEXT: retq
%shl = shl nuw i16 1, %cnt		%shl = shl nuw i16 1, %cnt
%conv = uitofp i16 %shl to double		%conv = uitofp i16 %shl to double
%mul = fmul double 9.745314e+288, %conv		%mul = fmul double 9.745314e+288, %conv
ret double %mul		ret double %mul
}		}

define <2 x double> @fdiv_pow_shl_cnt_vec(<2 x i64> %cnt) {		define <2 x double> @fdiv_pow_shl_cnt_vec(<2 x i64> %cnt) {
; CHECK-SSE-LABEL: fdiv_pow_shl_cnt_vec:		; CHECK-SSE-LABEL: fdiv_pow_shl_cnt_vec:
; CHECK-SSE: # %bb.0:		; CHECK-SSE: # %bb.0:
; CHECK-SSE-NEXT: movdqa {{.*#+}} xmm1 = [1,1]		; CHECK-SSE-NEXT: psllq $52, %xmm0
; CHECK-SSE-NEXT: movdqa %xmm1, %xmm2		; CHECK-SSE-NEXT: movdqa {{.*#+}} xmm1 = [4607182418800017408,4607182418800017408]
; CHECK-SSE-NEXT: psllq %xmm0, %xmm2		; CHECK-SSE-NEXT: psubq %xmm0, %xmm1
; CHECK-SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,2,3]		; CHECK-SSE-NEXT: movdqa %xmm1, %xmm0
; CHECK-SSE-NEXT: psllq %xmm0, %xmm1
; CHECK-SSE-NEXT: movsd {{.*#+}} xmm1 = xmm2[0],xmm1[1]
; CHECK-SSE-NEXT: movapd {{.*#+}} xmm0 = [4294967295,4294967295]
; CHECK-SSE-NEXT: andpd %xmm1, %xmm0
; CHECK-SSE-NEXT: orpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; CHECK-SSE-NEXT: psrlq $32, %xmm1
; CHECK-SSE-NEXT: por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
; CHECK-SSE-NEXT: subpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
; CHECK-SSE-NEXT: addpd %xmm0, %xmm1
; CHECK-SSE-NEXT: movapd {{.*#+}} xmm0 = [1.0E+0,1.0E+0]
; CHECK-SSE-NEXT: divpd %xmm1, %xmm0
; CHECK-SSE-NEXT: retq		; CHECK-SSE-NEXT: retq
;		;
; CHECK-AVX-LABEL: fdiv_pow_shl_cnt_vec:		; CHECK-AVX-LABEL: fdiv_pow_shl_cnt_vec:
; CHECK-AVX: # %bb.0:		; CHECK-AVX: # %bb.0:
; CHECK-AVX-NEXT: vpbroadcastq {{.*#+}} xmm1 = [1,1]		; CHECK-AVX-NEXT: vpsllq $52, %xmm0, %xmm0
; CHECK-AVX-NEXT: vpsllvq %xmm0, %xmm1, %xmm0		; CHECK-AVX-NEXT: vpbroadcastq {{.*#+}} xmm1 = [4607182418800017408,4607182418800017408]
; CHECK-AVX-NEXT: vpxor %xmm1, %xmm1, %xmm1		; CHECK-AVX-NEXT: vpsubq %xmm0, %xmm1, %xmm0
; CHECK-AVX-NEXT: vpblendd {{.*#+}} xmm1 = xmm0[0],xmm1[1],xmm0[2],xmm1[3]
; CHECK-AVX-NEXT: vpor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1, %xmm1
; CHECK-AVX-NEXT: vpsrlq $32, %xmm0, %xmm0
; CHECK-AVX-NEXT: vpor {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX-NEXT: vsubpd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; CHECK-AVX-NEXT: vaddpd %xmm0, %xmm1, %xmm0
; CHECK-AVX-NEXT: vmovddup {{.*#+}} xmm1 = [1.0E+0,1.0E+0]
; CHECK-AVX-NEXT: # xmm1 = mem[0,0]
; CHECK-AVX-NEXT: vdivpd %xmm0, %xmm1, %xmm0
; CHECK-AVX-NEXT: retq		; CHECK-AVX-NEXT: retq
%shl = shl nuw <2 x i64> <i64 1, i64 1>, %cnt		%shl = shl nuw <2 x i64> <i64 1, i64 1>, %cnt
%conv = uitofp <2 x i64> %shl to <2 x double>		%conv = uitofp <2 x i64> %shl to <2 x double>
%mul = fdiv <2 x double> <double 1.000000e+00, double 1.000000e+00>, %conv		%mul = fdiv <2 x double> <double 1.000000e+00, double 1.000000e+00>, %conv
ret <2 x double> %mul		ret <2 x double> %mul
}		}

define <2 x float> @fdiv_pow_shl_cnt_vec_with_expensive_cast(<2 x i64> %cnt) {		define <2 x float> @fdiv_pow_shl_cnt_vec_with_expensive_cast(<2 x i64> %cnt) {
; CHECK-SSE-LABEL: fdiv_pow_shl_cnt_vec_with_expensive_cast:		; CHECK-SSE-LABEL: fdiv_pow_shl_cnt_vec_with_expensive_cast:
; CHECK-SSE: # %bb.0:		; CHECK-SSE: # %bb.0:
; CHECK-SSE-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]		; CHECK-SSE-NEXT: pshufd {{.*#+}} xmm1 = xmm0[0,2,2,3]
; CHECK-SSE-NEXT: movdqa {{.*#+}} xmm3 = [1,1]		; CHECK-SSE-NEXT: pslld $23, %xmm1
; CHECK-SSE-NEXT: movdqa %xmm3, %xmm2		; CHECK-SSE-NEXT: movdqa {{.*#+}} xmm0 = <1065353216,1065353216,u,u>
; CHECK-SSE-NEXT: psllq %xmm1, %xmm2		; CHECK-SSE-NEXT: psubd %xmm1, %xmm0
; CHECK-SSE-NEXT: psllq %xmm0, %xmm3
; CHECK-SSE-NEXT: movq %xmm3, %rax
; CHECK-SSE-NEXT: testq %rax, %rax
; CHECK-SSE-NEXT: js .LBB14_1
; CHECK-SSE-NEXT: # %bb.2:
; CHECK-SSE-NEXT: xorps %xmm1, %xmm1
; CHECK-SSE-NEXT: cvtsi2ss %rax, %xmm1
; CHECK-SSE-NEXT: jmp .LBB14_3
; CHECK-SSE-NEXT: .LBB14_1:
; CHECK-SSE-NEXT: movq %rax, %rcx
; CHECK-SSE-NEXT: shrq %rcx
; CHECK-SSE-NEXT: andl $1, %eax
; CHECK-SSE-NEXT: orq %rcx, %rax
; CHECK-SSE-NEXT: xorps %xmm1, %xmm1
; CHECK-SSE-NEXT: cvtsi2ss %rax, %xmm1
; CHECK-SSE-NEXT: addss %xmm1, %xmm1
; CHECK-SSE-NEXT: .LBB14_3:
; CHECK-SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm2[2,3,2,3]
; CHECK-SSE-NEXT: movq %xmm0, %rax
; CHECK-SSE-NEXT: testq %rax, %rax
; CHECK-SSE-NEXT: js .LBB14_4
; CHECK-SSE-NEXT: # %bb.5:
; CHECK-SSE-NEXT: xorps %xmm0, %xmm0
; CHECK-SSE-NEXT: cvtsi2ss %rax, %xmm0
; CHECK-SSE-NEXT: jmp .LBB14_6
; CHECK-SSE-NEXT: .LBB14_4:
; CHECK-SSE-NEXT: movq %rax, %rcx
; CHECK-SSE-NEXT: shrq %rcx
; CHECK-SSE-NEXT: andl $1, %eax
; CHECK-SSE-NEXT: orq %rcx, %rax
; CHECK-SSE-NEXT: xorps %xmm0, %xmm0
; CHECK-SSE-NEXT: cvtsi2ss %rax, %xmm0
; CHECK-SSE-NEXT: addss %xmm0, %xmm0
; CHECK-SSE-NEXT: .LBB14_6:
; CHECK-SSE-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
; CHECK-SSE-NEXT: movaps {{.*#+}} xmm0 = <1.0E+0,1.0E+0,u,u>
; CHECK-SSE-NEXT: divps %xmm1, %xmm0
; CHECK-SSE-NEXT: retq		; CHECK-SSE-NEXT: retq
;		;
; CHECK-AVX2-LABEL: fdiv_pow_shl_cnt_vec_with_expensive_cast:		; CHECK-AVX-LABEL: fdiv_pow_shl_cnt_vec_with_expensive_cast:
; CHECK-AVX2: # %bb.0:		; CHECK-AVX: # %bb.0:
; CHECK-AVX2-NEXT: vpbroadcastq {{.*#+}} xmm1 = [1,1]		; CHECK-AVX-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
; CHECK-AVX2-NEXT: vpsllvq %xmm0, %xmm1, %xmm0		; CHECK-AVX-NEXT: vpslld $23, %xmm0, %xmm0
; CHECK-AVX2-NEXT: vpand %xmm1, %xmm0, %xmm1		; CHECK-AVX-NEXT: vpbroadcastd {{.*#+}} xmm1 = [1065353216,1065353216,1065353216,1065353216]
; CHECK-AVX2-NEXT: vpsrlq $1, %xmm0, %xmm2		; CHECK-AVX-NEXT: vpsubd %xmm0, %xmm1, %xmm0
; CHECK-AVX2-NEXT: vpor %xmm1, %xmm2, %xmm1		; CHECK-AVX-NEXT: retq
; CHECK-AVX2-NEXT: vblendvpd %xmm0, %xmm1, %xmm0, %xmm1
; CHECK-AVX2-NEXT: vpextrq $1, %xmm1, %rax
; CHECK-AVX2-NEXT: vcvtsi2ss %rax, %xmm3, %xmm2
; CHECK-AVX2-NEXT: vmovq %xmm1, %rax
; CHECK-AVX2-NEXT: vcvtsi2ss %rax, %xmm3, %xmm1
; CHECK-AVX2-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0],xmm2[0],zero,zero
; CHECK-AVX2-NEXT: vaddps %xmm1, %xmm1, %xmm2
; CHECK-AVX2-NEXT: vpxor %xmm3, %xmm3, %xmm3
; CHECK-AVX2-NEXT: vpcmpgtq %xmm0, %xmm3, %xmm0
; CHECK-AVX2-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,3,2,3]
; CHECK-AVX2-NEXT: vblendvps %xmm0, %xmm2, %xmm1, %xmm0
; CHECK-AVX2-NEXT: vbroadcastss {{.*#+}} xmm1 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0]
; CHECK-AVX2-NEXT: vdivps %xmm0, %xmm1, %xmm0
; CHECK-AVX2-NEXT: retq
;
; CHECK-AVX512F-LABEL: fdiv_pow_shl_cnt_vec_with_expensive_cast:
; CHECK-AVX512F: # %bb.0:
; CHECK-AVX512F-NEXT: vpbroadcastq {{.*#+}} xmm1 = [1,1]
; CHECK-AVX512F-NEXT: vpsllvq %xmm0, %xmm1, %xmm0
; CHECK-AVX512F-NEXT: vpextrq $1, %xmm0, %rax
; CHECK-AVX512F-NEXT: vcvtusi2ss %rax, %xmm2, %xmm1
; CHECK-AVX512F-NEXT: vmovq %xmm0, %rax
; CHECK-AVX512F-NEXT: vcvtusi2ss %rax, %xmm2, %xmm0
; CHECK-AVX512F-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],zero,zero
; CHECK-AVX512F-NEXT: vbroadcastss {{.*#+}} xmm1 = [1.0E+0,1.0E+0,1.0E+0,1.0E+0]
; CHECK-AVX512F-NEXT: vdivps %xmm0, %xmm1, %xmm0
; CHECK-AVX512F-NEXT: retq
%shl = shl nuw <2 x i64> <i64 1, i64 1>, %cnt		%shl = shl nuw <2 x i64> <i64 1, i64 1>, %cnt
%conv = uitofp <2 x i64> %shl to <2 x float>		%conv = uitofp <2 x i64> %shl to <2 x float>
%mul = fdiv <2 x float> <float 1.000000e+00, float 1.000000e+00>, %conv		%mul = fdiv <2 x float> <float 1.000000e+00, float 1.000000e+00>, %conv
ret <2 x float> %mul		ret <2 x float> %mul
}		}

define float @fdiv_pow_shl_cnt_fail_maybe_z(i64 %cnt) {		define float @fdiv_pow_shl_cnt_fail_maybe_z(i64 %cnt) {
; CHECK-SSE-LABEL: fdiv_pow_shl_cnt_fail_maybe_z:		; CHECK-SSE-LABEL: fdiv_pow_shl_cnt_fail_maybe_z:
▲ Show 20 Lines • Show All 169 Lines • ▼ Show 20 Lines	; CHECK-AVX512F-NEXT: retq
%conv = uitofp i32 %shl to half		%conv = uitofp i32 %shl to half
%mul = fdiv half 0xH7000, %conv		%mul = fdiv half 0xH7000, %conv
ret half %mul		ret half %mul
}		}

define half @fdiv_pow_shl_cnt_in_bounds(i16 %cnt) {		define half @fdiv_pow_shl_cnt_in_bounds(i16 %cnt) {
; CHECK-SSE-LABEL: fdiv_pow_shl_cnt_in_bounds:		; CHECK-SSE-LABEL: fdiv_pow_shl_cnt_in_bounds:
; CHECK-SSE: # %bb.0:		; CHECK-SSE: # %bb.0:
; CHECK-SSE-NEXT: pushq %rax		; CHECK-SSE-NEXT: shll $10, %edi
; CHECK-SSE-NEXT: .cfi_def_cfa_offset 16		; CHECK-SSE-NEXT: movl $28672, %eax # imm = 0x7000
; CHECK-SSE-NEXT: movl %edi, %ecx		; CHECK-SSE-NEXT: subl %edi, %eax
; CHECK-SSE-NEXT: movl $1, %eax		; CHECK-SSE-NEXT: pinsrw $0, %eax, %xmm0
; CHECK-SSE-NEXT: # kill: def $cl killed $cl killed $ecx
; CHECK-SSE-NEXT: shll %cl, %eax
; CHECK-SSE-NEXT: movzwl %ax, %eax
; CHECK-SSE-NEXT: cvtsi2ss %eax, %xmm0
; CHECK-SSE-NEXT: callq __truncsfhf2@PLT
; CHECK-SSE-NEXT: callq __extendhfsf2@PLT
; CHECK-SSE-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; CHECK-SSE-NEXT: divss %xmm0, %xmm1
; CHECK-SSE-NEXT: movaps %xmm1, %xmm0
; CHECK-SSE-NEXT: callq __truncsfhf2@PLT
; CHECK-SSE-NEXT: popq %rax
; CHECK-SSE-NEXT: .cfi_def_cfa_offset 8
; CHECK-SSE-NEXT: retq		; CHECK-SSE-NEXT: retq
;		;
; CHECK-AVX2-LABEL: fdiv_pow_shl_cnt_in_bounds:		; CHECK-AVX-LABEL: fdiv_pow_shl_cnt_in_bounds:
; CHECK-AVX2: # %bb.0:		; CHECK-AVX: # %bb.0:
; CHECK-AVX2-NEXT: pushq %rax		; CHECK-AVX-NEXT: shll $10, %edi
; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 16		; CHECK-AVX-NEXT: movl $28672, %eax # imm = 0x7000
; CHECK-AVX2-NEXT: movl %edi, %ecx		; CHECK-AVX-NEXT: subl %edi, %eax
; CHECK-AVX2-NEXT: movl $1, %eax		; CHECK-AVX-NEXT: vpinsrw $0, %eax, %xmm0, %xmm0
; CHECK-AVX2-NEXT: # kill: def $cl killed $cl killed $ecx		; CHECK-AVX-NEXT: retq
; CHECK-AVX2-NEXT: shll %cl, %eax
; CHECK-AVX2-NEXT: movzwl %ax, %eax
; CHECK-AVX2-NEXT: vcvtsi2ss %eax, %xmm0, %xmm0
; CHECK-AVX2-NEXT: callq __truncsfhf2@PLT
; CHECK-AVX2-NEXT: callq __extendhfsf2@PLT
; CHECK-AVX2-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; CHECK-AVX2-NEXT: vdivss %xmm0, %xmm1, %xmm0
; CHECK-AVX2-NEXT: callq __truncsfhf2@PLT
; CHECK-AVX2-NEXT: popq %rax
; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 8
; CHECK-AVX2-NEXT: retq
;
; CHECK-AVX512F-LABEL: fdiv_pow_shl_cnt_in_bounds:
; CHECK-AVX512F: # %bb.0:
; CHECK-AVX512F-NEXT: movl %edi, %ecx
; CHECK-AVX512F-NEXT: movl $1, %eax
; CHECK-AVX512F-NEXT: # kill: def $cl killed $cl killed $ecx
; CHECK-AVX512F-NEXT: shll %cl, %eax
; CHECK-AVX512F-NEXT: movzwl %ax, %eax
; CHECK-AVX512F-NEXT: vcvtsi2ss %eax, %xmm0, %xmm0
; CHECK-AVX512F-NEXT: vcvtps2ph $4, %xmm0, %xmm0
; CHECK-AVX512F-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
; CHECK-AVX512F-NEXT: vcvtph2ps %xmm0, %xmm0
; CHECK-AVX512F-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; CHECK-AVX512F-NEXT: vdivss %xmm0, %xmm1, %xmm0
; CHECK-AVX512F-NEXT: vcvtps2ph $4, %xmm0, %xmm0
; CHECK-AVX512F-NEXT: vmovd %xmm0, %eax
; CHECK-AVX512F-NEXT: vpinsrw $0, %eax, %xmm0, %xmm0
; CHECK-AVX512F-NEXT: retq
%shl = shl nuw i16 1, %cnt		%shl = shl nuw i16 1, %cnt
%conv = uitofp i16 %shl to half		%conv = uitofp i16 %shl to half
%mul = fdiv half 0xH7000, %conv		%mul = fdiv half 0xH7000, %conv
ret half %mul		ret half %mul
}		}

define half @fdiv_pow_shl_cnt_in_bounds2(i16 %cnt) {		define half @fdiv_pow_shl_cnt_in_bounds2(i16 %cnt) {
; CHECK-SSE-LABEL: fdiv_pow_shl_cnt_in_bounds2:		; CHECK-SSE-LABEL: fdiv_pow_shl_cnt_in_bounds2:
; CHECK-SSE: # %bb.0:		; CHECK-SSE: # %bb.0:
; CHECK-SSE-NEXT: pushq %rax		; CHECK-SSE-NEXT: shll $10, %edi
; CHECK-SSE-NEXT: .cfi_def_cfa_offset 16		; CHECK-SSE-NEXT: movl $18432, %eax # imm = 0x4800
; CHECK-SSE-NEXT: movl %edi, %ecx		; CHECK-SSE-NEXT: subl %edi, %eax
; CHECK-SSE-NEXT: movl $1, %eax		; CHECK-SSE-NEXT: pinsrw $0, %eax, %xmm0
; CHECK-SSE-NEXT: # kill: def $cl killed $cl killed $ecx
; CHECK-SSE-NEXT: shll %cl, %eax
; CHECK-SSE-NEXT: movzwl %ax, %eax
; CHECK-SSE-NEXT: cvtsi2ss %eax, %xmm0
; CHECK-SSE-NEXT: callq __truncsfhf2@PLT
; CHECK-SSE-NEXT: callq __extendhfsf2@PLT
; CHECK-SSE-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; CHECK-SSE-NEXT: divss %xmm0, %xmm1
; CHECK-SSE-NEXT: movaps %xmm1, %xmm0
; CHECK-SSE-NEXT: callq __truncsfhf2@PLT
; CHECK-SSE-NEXT: popq %rax
; CHECK-SSE-NEXT: .cfi_def_cfa_offset 8
; CHECK-SSE-NEXT: retq		; CHECK-SSE-NEXT: retq
;		;
; CHECK-AVX2-LABEL: fdiv_pow_shl_cnt_in_bounds2:		; CHECK-AVX-LABEL: fdiv_pow_shl_cnt_in_bounds2:
; CHECK-AVX2: # %bb.0:		; CHECK-AVX: # %bb.0:
; CHECK-AVX2-NEXT: pushq %rax		; CHECK-AVX-NEXT: shll $10, %edi
; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 16		; CHECK-AVX-NEXT: movl $18432, %eax # imm = 0x4800
; CHECK-AVX2-NEXT: movl %edi, %ecx		; CHECK-AVX-NEXT: subl %edi, %eax
; CHECK-AVX2-NEXT: movl $1, %eax		; CHECK-AVX-NEXT: vpinsrw $0, %eax, %xmm0, %xmm0
; CHECK-AVX2-NEXT: # kill: def $cl killed $cl killed $ecx		; CHECK-AVX-NEXT: retq
; CHECK-AVX2-NEXT: shll %cl, %eax
; CHECK-AVX2-NEXT: movzwl %ax, %eax
; CHECK-AVX2-NEXT: vcvtsi2ss %eax, %xmm0, %xmm0
; CHECK-AVX2-NEXT: callq __truncsfhf2@PLT
; CHECK-AVX2-NEXT: callq __extendhfsf2@PLT
; CHECK-AVX2-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; CHECK-AVX2-NEXT: vdivss %xmm0, %xmm1, %xmm0
; CHECK-AVX2-NEXT: callq __truncsfhf2@PLT
; CHECK-AVX2-NEXT: popq %rax
; CHECK-AVX2-NEXT: .cfi_def_cfa_offset 8
; CHECK-AVX2-NEXT: retq
;
; CHECK-AVX512F-LABEL: fdiv_pow_shl_cnt_in_bounds2:
; CHECK-AVX512F: # %bb.0:
; CHECK-AVX512F-NEXT: movl %edi, %ecx
; CHECK-AVX512F-NEXT: movl $1, %eax
; CHECK-AVX512F-NEXT: # kill: def $cl killed $cl killed $ecx
; CHECK-AVX512F-NEXT: shll %cl, %eax
; CHECK-AVX512F-NEXT: movzwl %ax, %eax
; CHECK-AVX512F-NEXT: vcvtsi2ss %eax, %xmm0, %xmm0
; CHECK-AVX512F-NEXT: vcvtps2ph $4, %xmm0, %xmm0
; CHECK-AVX512F-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
; CHECK-AVX512F-NEXT: vcvtph2ps %xmm0, %xmm0
; CHECK-AVX512F-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
; CHECK-AVX512F-NEXT: vdivss %xmm0, %xmm1, %xmm0
; CHECK-AVX512F-NEXT: vcvtps2ph $4, %xmm0, %xmm0
; CHECK-AVX512F-NEXT: vmovd %xmm0, %eax
; CHECK-AVX512F-NEXT: vpinsrw $0, %eax, %xmm0, %xmm0
; CHECK-AVX512F-NEXT: retq
%shl = shl nuw i16 1, %cnt		%shl = shl nuw i16 1, %cnt
%conv = uitofp i16 %shl to half		%conv = uitofp i16 %shl to half
%mul = fdiv half 0xH4800, %conv		%mul = fdiv half 0xH4800, %conv
ret half %mul		ret half %mul
}		}

define half @fdiv_pow_shl_cnt_fail_out_of_bound2(i16 %cnt) {		define half @fdiv_pow_shl_cnt_fail_out_of_bound2(i16 %cnt) {
; CHECK-SSE-LABEL: fdiv_pow_shl_cnt_fail_out_of_bound2:		; CHECK-SSE-LABEL: fdiv_pow_shl_cnt_fail_out_of_bound2:
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Fold IEEE `fmul`/`fdiv` by Pow2 to `add`/`sub` of expClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 538733

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/X86/X86ISelLowering.h

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/fold-int-pow2-with-fmul-or-fdiv.ll

[DAGCombiner] Fold IEEE `fmul`/`fdiv` by Pow2 to `add`/`sub` of exp
ClosedPublic