This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64ISelLowering.cpp
-
test/Transforms/CodeGenPrepare/AArch64/
-
Transforms/
-
CodeGenPrepare/
-
AArch64/
-
sink-free-instructions.ll

Differential D126234

[AArch64] Add support for FMA intrinsics to shouldSinkOperands.
ClosedPublic

Authored by fhahn on May 23 2022, 12:33 PM.

Download Raw Diff

Details

Reviewers

dmgreen
samparker
t.p.northover

Commits

rG786c687810a5: [AArch64] Add support for FMA intrinsics to shouldSinkOperands.

Summary

If the fma operates on a legal vector type, the indexed variants can be
used, if the second operand is a splat of a valid index.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.May 23 2022, 12:33 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 12:33 PM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

fhahn requested review of this revision.May 23 2022, 12:33 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 12:33 PM

Harbormaster completed remote builds in B165894: Diff 431453.May 23 2022, 1:17 PM

I think that non-legal types will be legalized to legal types where a splat can still be used for lane indexing.
And it seems like the first two operands of a fma are commutative?
And I don't the splat index can be out of range, if we consider illegal types to be legalized.

Which means we might be able to simplify it to this, maybe with a check for fullfp16 and the the scalar types are f16/f32/f64?

+    case Intrinsic::fma:
     case Intrinsic::aarch64_neon_sqdmull:
     case Intrinsic::aarch64_neon_sqdmulh:
     case Intrinsic::aarch64_neon_sqrdmulh:
       // Sink splats for index lane variants
       if (isSplatShuffle(II->getOperand(0)))
         Ops.push_back(&II->getOperandUse(0));
       if (isSplatShuffle(II->getOperand(1)))
         Ops.push_back(&II->getOperandUse(1));
       return !Ops.empty();

Note that I have also heard of cases where the lane index fma instructions are slightly slower than a normal fma, as they get cracked into a dup and a fma micro-op. Which would make leaving them outside of the loop a better idea, and might make Machine Combiner a better place for this if it worked. But I don't know where that actually happens, and there doesn't seem to be an obvious place intree where the latencies are different except for one old core and this cyclone difference which doesn't make a lot of sense: https://godbolt.org/z/63eWsEqT1. So I think sinking them unconditionally is probably fine if it saves the register pressure. We just might need to change that in the future if we find cases where it does matter.

Thanks for taking a look!

In D126234#3536753, @dmgreen wrote:
I think that non-legal types will be legalized to legal types where a splat can still be used for lane indexing.
And it seems like the first two operands of a fma are commutative?
And I don't the splat index can be out of range, if we consider illegal types to be legalized.

Which means we might be able to simplify it to this, maybe with a check for fullfp16 and the the scalar types are f16/f32/f64?
+    case Intrinsic::fma:
     case Intrinsic::aarch64_neon_sqdmull:
     case Intrinsic::aarch64_neon_sqdmulh:
     case Intrinsic::aarch64_neon_sqrdmulh:
       // Sink splats for index lane variants
       if (isSplatShuffle(II->getOperand(0)))
         Ops.push_back(&II->getOperandUse(0));
       if (isSplatShuffle(II->getOperand(1)))
         Ops.push_back(&II->getOperandUse(1));
       return !Ops.empty();

That's a good point, the original checks are probably more paranoid than necessary. I updated the patch to use the common code with aarch64_neon_sqdmul*, with a check for the element type for fp16.

Note that I have also heard of cases where the lane index fma instructions are slightly slower than a normal fma, as they get cracked into a dup and a fma micro-op. Which would make leaving them outside of the loop a better idea, and might make Machine Combiner a better place for this if it worked. But I don't know where that actually happens, and there doesn't seem to be an obvious place intree where the latencies are different except for one old core and this cyclone difference which doesn't make a lot of sense: https://godbolt.org/z/63eWsEqT1. So I think sinking them unconditionally is probably fine if it saves the register pressure. We just might need to change that in the future if we find cases where it does matter.

From the publicly available ARM software optimization guides, it looks like the indexed/non-indexed variants should give the same performance on A57/A75, but the indexed variants have less throughput on A55. A quick glance at the scheduling model for A55 seems to indicate that this difference is not reflected in the model though.

It also looks like the same issue would exist for other indexed variants handled there. If issues show up investigating handling this more granually sounds like a good idea. We could also limit the sinking to cores where indexed and non-indexed variants have the same performance, if needed.

Harbormaster completed remote builds in B166435: Diff 432223.May 26 2022, 3:37 AM

From the publicly available ARM software optimization guides, it looks like the indexed/non-indexed variants should give the same performance on A57/A75, but the indexed variants have less throughput on A55. A quick glance at the scheduling model for A55 seems to indicate that this difference is not reflected in the model though.

It also looks like the same issue would exist for other indexed variants handled there. If issues show up investigating handling this more granually sounds like a good idea. We could also limit the sinking to cores where indexed and non-indexed variants have the same performance, if needed.

The testing I usually run included some A55 runs, and I wasn't seeing any differences in performance. It turns out that the test case I have for fma+dup would come through to MachineCombiner as FADD+FMUL+DUP though, not llvm.fma. It was MachineCombiner that did the conversion in the end though (at least for one of the two fma's, the other becomes a non-lane-indexed version because the pattern is actually FADD(FMUL(COPY(DUP(..))), and I don't think the Machine Combiner is ignoring no-op copies).

Anyway - Having fixed that I still do not see any difference between the indexed and non-indexed variants of FMA A55 when the index versions are forced off, so I think performance-wise this is OK. The indexed variants seem to perform better in general, even if they do have a lower throughput.

So LGTM. Cheers

This revision is now accepted and ready to land.May 26 2022, 9:23 AM

This revision was landed with ongoing or failed builds.May 27 2022, 2:37 AM

Closed by commit rG786c687810a5: [AArch64] Add support for FMA intrinsics to shouldSinkOperands. (authored by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rG786c687810a5: [AArch64] Add support for FMA intrinsics to shouldSinkOperands..

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

6 lines

test/

Transforms/

CodeGenPrepare/

AArch64/

sink-free-instructions.ll

127 lines

Diff 432503

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,539 Lines • ▼ Show 20 Lines	if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
case Intrinsic::aarch64_neon_umull:		case Intrinsic::aarch64_neon_umull:
if (areExtractShuffleVectors(II->getOperand(0), II->getOperand(1))) {		if (areExtractShuffleVectors(II->getOperand(0), II->getOperand(1))) {
Ops.push_back(&II->getOperandUse(0));		Ops.push_back(&II->getOperandUse(0));
Ops.push_back(&II->getOperandUse(1));		Ops.push_back(&II->getOperandUse(1));
return true;		return true;
}		}
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;

		case Intrinsic::fma:
		if (cast<VectorType>(I->getType())->getElementType()->isHalfTy() &&
		!Subtarget->hasFullFP16())
		return false;
		LLVM_FALLTHROUGH;
case Intrinsic::aarch64_neon_sqdmull:		case Intrinsic::aarch64_neon_sqdmull:
case Intrinsic::aarch64_neon_sqdmulh:		case Intrinsic::aarch64_neon_sqdmulh:
case Intrinsic::aarch64_neon_sqrdmulh:		case Intrinsic::aarch64_neon_sqrdmulh:
// Sink splats for index lane variants		// Sink splats for index lane variants
if (isSplatShuffle(II->getOperand(0)))		if (isSplatShuffle(II->getOperand(0)))
Ops.push_back(&II->getOperandUse(0));		Ops.push_back(&II->getOperandUse(0));
if (isSplatShuffle(II->getOperand(1)))		if (isSplatShuffle(II->getOperand(1)))
Ops.push_back(&II->getOperandUse(1));		Ops.push_back(&II->getOperandUse(1));
return !Ops.empty();		return !Ops.empty();

case Intrinsic::aarch64_neon_pmull:		case Intrinsic::aarch64_neon_pmull:
if (!areExtractShuffleVectors(II->getOperand(0), II->getOperand(1)))		if (!areExtractShuffleVectors(II->getOperand(0), II->getOperand(1)))
return false;		return false;
Ops.push_back(&II->getOperandUse(0));		Ops.push_back(&II->getOperandUse(0));
Ops.push_back(&II->getOperandUse(1));		Ops.push_back(&II->getOperandUse(1));
return true;		return true;
case Intrinsic::aarch64_neon_pmull64:		case Intrinsic::aarch64_neon_pmull64:
if (!areOperandsOfVmullHighP64(II->getArgOperand(0),		if (!areOperandsOfVmullHighP64(II->getArgOperand(0),
II->getArgOperand(1)))		II->getArgOperand(1)))
return false;		return false;
Ops.push_back(&II->getArgOperandUse(0));		Ops.push_back(&II->getArgOperandUse(0));
Ops.push_back(&II->getArgOperandUse(1));		Ops.push_back(&II->getArgOperandUse(1));
return true;		return true;

default:		default:
return false;		return false;
}		}
}		}

switch (I->getOpcode()) {		switch (I->getOpcode()) {
case Instruction::Sub:		case Instruction::Sub:
case Instruction::Add: {		case Instruction::Add: {
▲ Show 20 Lines • Show All 8,665 Lines • Show Last 20 Lines

llvm/test/Transforms/CodeGenPrepare/AArch64/sink-free-instructions.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -codegenprepare -S \| FileCheck %s		; RUN: opt < %s -codegenprepare -S \| FileCheck --check-prefixes=CHECK,NOFP16 %s
		; RUN: opt < %s -codegenprepare -S -mattr=+fullfp16 \| FileCheck --check-prefixes=CHECK,FULLFP16 %s

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64-unknown"		target triple = "aarch64-unknown"

define <8 x i16> @sink_zext(<8 x i8> %a, <8 x i8> %b, i1 %c) {		define <8 x i16> @sink_zext(<8 x i8> %a, <8 x i8> %b, i1 %c) {
; CHECK-LABEL: @sink_zext(		; CHECK-LABEL: @sink_zext(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]		; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
▲ Show 20 Lines • Show All 482 Lines • ▼ Show 20 Lines	if.else:
%s4 = shufflevector <16 x i8> %b, <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>		%s4 = shufflevector <16 x i8> %b, <16 x i8> poison, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
%vmull1 = tail call <8 x i16> @llvm.aarch64.neon.pmull.v8i16(<8 x i8> %s3, <8 x i8> %s4)		%vmull1 = tail call <8 x i16> @llvm.aarch64.neon.pmull.v8i16(<8 x i8> %s3, <8 x i8> %s4)
ret <8 x i16> %vmull1		ret <8 x i16> %vmull1
}		}

declare <8 x half> @llvm.fma.v8f16(<8 x half>, <8 x half>, <8 x half>)		declare <8 x half> @llvm.fma.v8f16(<8 x half>, <8 x half>, <8 x half>)

define <8 x half> @sink_shufflevector_fma_v8f16(i1 %c, <8 x half> %a, <8 x half> %b) {		define <8 x half> @sink_shufflevector_fma_v8f16(i1 %c, <8 x half> %a, <8 x half> %b) {
; CHECK-LABEL: @sink_shufflevector_fma_v8f16(		; NOFP16-LABEL: @sink_shufflevector_fma_v8f16(
; CHECK-NEXT: entry:		; NOFP16-NEXT: entry:
; CHECK-NEXT: [[S0:%.]] = shufflevector <8 x half> [[A:%.]], <8 x half> poison, <8 x i32> zeroinitializer		; NOFP16-NEXT: [[S0:%.]] = shufflevector <8 x half> [[A:%.]], <8 x half> poison, <8 x i32> zeroinitializer
; CHECK-NEXT: [[S1:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>		; NOFP16-NEXT: [[S1:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
; CHECK-NEXT: [[S2:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>		; NOFP16-NEXT: [[S2:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>
; CHECK-NEXT: [[S3:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>		; NOFP16-NEXT: [[S3:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
; CHECK-NEXT: [[S4:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4>		; NOFP16-NEXT: [[S4:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4>
; CHECK-NEXT: [[S5:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>		; NOFP16-NEXT: [[S5:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>
; CHECK-NEXT: [[S6:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 6, i32 6, i32 6, i32 6, i32 6, i32 6, i32 6, i32 6>		; NOFP16-NEXT: [[S6:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 6, i32 6, i32 6, i32 6, i32 6, i32 6, i32 6, i32 6>
; CHECK-NEXT: [[S7:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>		; NOFP16-NEXT: [[S7:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]		; NOFP16-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
; CHECK: if.then:		; NOFP16: if.then:
; CHECK-NEXT: [[R_0:%.]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[B:%.]], <8 x half> [[S0]], <8 x half> [[B]])		; NOFP16-NEXT: [[R_0:%.]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[B:%.]], <8 x half> [[S0]], <8 x half> [[B]])
; CHECK-NEXT: [[R_1:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_0]], <8 x half> [[S1]], <8 x half> [[B]])		; NOFP16-NEXT: [[R_1:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_0]], <8 x half> [[S1]], <8 x half> [[B]])
; CHECK-NEXT: [[R_2:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_1]], <8 x half> [[S2]], <8 x half> [[B]])		; NOFP16-NEXT: [[R_2:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_1]], <8 x half> [[S2]], <8 x half> [[B]])
; CHECK-NEXT: [[R_3:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_2]], <8 x half> [[S3]], <8 x half> [[B]])		; NOFP16-NEXT: [[R_3:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_2]], <8 x half> [[S3]], <8 x half> [[B]])
; CHECK-NEXT: ret <8 x half> [[R_3]]		; NOFP16-NEXT: ret <8 x half> [[R_3]]
; CHECK: if.else:		; NOFP16: if.else:
; CHECK-NEXT: [[R_4:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[B]], <8 x half> [[S4]], <8 x half> [[B]])		; NOFP16-NEXT: [[R_4:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[B]], <8 x half> [[S4]], <8 x half> [[B]])
; CHECK-NEXT: [[R_5:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_4]], <8 x half> [[S5]], <8 x half> [[B]])		; NOFP16-NEXT: [[R_5:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_4]], <8 x half> [[S5]], <8 x half> [[B]])
; CHECK-NEXT: [[R_6:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_5]], <8 x half> [[S6]], <8 x half> [[B]])		; NOFP16-NEXT: [[R_6:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_5]], <8 x half> [[S6]], <8 x half> [[B]])
; CHECK-NEXT: [[R_7:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_6]], <8 x half> [[S7]], <8 x half> [[B]])		; NOFP16-NEXT: [[R_7:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_6]], <8 x half> [[S7]], <8 x half> [[B]])
; CHECK-NEXT: ret <8 x half> [[R_7]]		; NOFP16-NEXT: ret <8 x half> [[R_7]]
		;
		; FULLFP16-LABEL: @sink_shufflevector_fma_v8f16(
		; FULLFP16-NEXT: entry:
		; FULLFP16-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
		; FULLFP16: if.then:
		; FULLFP16-NEXT: [[TMP0:%.]] = shufflevector <8 x half> [[A:%.]], <8 x half> poison, <8 x i32> zeroinitializer
		; FULLFP16-NEXT: [[R_0:%.]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[B:%.]], <8 x half> [[TMP0]], <8 x half> [[B]])
		; FULLFP16-NEXT: [[TMP1:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
		; FULLFP16-NEXT: [[R_1:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_0]], <8 x half> [[TMP1]], <8 x half> [[B]])
		; FULLFP16-NEXT: [[TMP2:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>
		; FULLFP16-NEXT: [[R_2:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_1]], <8 x half> [[TMP2]], <8 x half> [[B]])
		; FULLFP16-NEXT: [[TMP3:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
		; FULLFP16-NEXT: [[R_3:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_2]], <8 x half> [[TMP3]], <8 x half> [[B]])
		; FULLFP16-NEXT: ret <8 x half> [[R_3]]
		; FULLFP16: if.else:
		; FULLFP16-NEXT: [[TMP4:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4>
		; FULLFP16-NEXT: [[R_4:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[B]], <8 x half> [[TMP4]], <8 x half> [[B]])
		; FULLFP16-NEXT: [[TMP5:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>
		; FULLFP16-NEXT: [[R_5:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_4]], <8 x half> [[TMP5]], <8 x half> [[B]])
		; FULLFP16-NEXT: [[TMP6:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 6, i32 6, i32 6, i32 6, i32 6, i32 6, i32 6, i32 6>
		; FULLFP16-NEXT: [[R_6:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_5]], <8 x half> [[TMP6]], <8 x half> [[B]])
		; FULLFP16-NEXT: [[TMP7:%.*]] = shufflevector <8 x half> [[A]], <8 x half> poison, <8 x i32> <i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7, i32 7>
		; FULLFP16-NEXT: [[R_7:%.*]] = tail call fast <8 x half> @llvm.fma.v8f16(<8 x half> [[R_6]], <8 x half> [[TMP7]], <8 x half> [[B]])
		; FULLFP16-NEXT: ret <8 x half> [[R_7]]
;		;
entry:		entry:
%s0 = shufflevector <8 x half> %a, <8 x half> poison, <8 x i32> zeroinitializer		%s0 = shufflevector <8 x half> %a, <8 x half> poison, <8 x i32> zeroinitializer
%s1 = shufflevector <8 x half> %a, <8 x half> poison, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>		%s1 = shufflevector <8 x half> %a, <8 x half> poison, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
%s2 = shufflevector <8 x half> %a, <8 x half> poison, <8 x i32> <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>		%s2 = shufflevector <8 x half> %a, <8 x half> poison, <8 x i32> <i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2, i32 2>
%s3 = shufflevector <8 x half> %a, <8 x half> poison, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>		%s3 = shufflevector <8 x half> %a, <8 x half> poison, <8 x i32> <i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3, i32 3>
%s4 = shufflevector <8 x half> %a, <8 x half> poison, <8 x i32> <i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4>		%s4 = shufflevector <8 x half> %a, <8 x half> poison, <8 x i32> <i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4, i32 4>
%s5 = shufflevector <8 x half> %a, <8 x half> poison, <8 x i32> <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>		%s5 = shufflevector <8 x half> %a, <8 x half> poison, <8 x i32> <i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5, i32 5>
Show All 16 Lines	if.else:
ret <8 x half> %r.7		ret <8 x half> %r.7
}		}

declare <4 x float> @llvm.fma.v4f32(<4 x float>, <4 x float>, <4 x float>)		declare <4 x float> @llvm.fma.v4f32(<4 x float>, <4 x float>, <4 x float>)

define <4 x float> @sink_shufflevector_fma_v4f32(i1 %c, <8 x float> %a, <4 x float> %b) {		define <4 x float> @sink_shufflevector_fma_v4f32(i1 %c, <8 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @sink_shufflevector_fma_v4f32(		; CHECK-LABEL: @sink_shufflevector_fma_v4f32(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[S0:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: [[S1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
; CHECK-NEXT: [[S2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 2>
; CHECK-NEXT: [[S3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]		; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[R_0:%.]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[B:%.]], <4 x float> [[S0]], <4 x float> [[B]])		; CHECK-NEXT: [[TMP0:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: [[R_1:%.*]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[R_0]], <4 x float> [[S1]], <4 x float> [[B]])		; CHECK-NEXT: [[R_0:%.]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[B:%.]], <4 x float> [[TMP0]], <4 x float> [[B]])
		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
		; CHECK-NEXT: [[R_1:%.*]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[R_0]], <4 x float> [[TMP1]], <4 x float> [[B]])
; CHECK-NEXT: ret <4 x float> [[R_1]]		; CHECK-NEXT: ret <4 x float> [[R_1]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: [[R_2:%.*]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[B]], <4 x float> [[S2]], <4 x float> [[B]])		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 2>
; CHECK-NEXT: [[R_3:%.*]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[R_2]], <4 x float> [[S3]], <4 x float> [[B]])		; CHECK-NEXT: [[R_2:%.*]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[B]], <4 x float> [[TMP2]], <4 x float> [[B]])
		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
		; CHECK-NEXT: [[R_3:%.*]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[R_2]], <4 x float> [[TMP3]], <4 x float> [[B]])
; CHECK-NEXT: ret <4 x float> [[R_3]]		; CHECK-NEXT: ret <4 x float> [[R_3]]
;		;
entry:		entry:
%s0 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> zeroinitializer		%s0 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> zeroinitializer
%s1 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> <i32 1, i32 1, i32 1, i32 1>		%s1 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
%s2 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 2>		%s2 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 2>
%s3 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> <i32 3, i32 3, i32 3, i32 3>		%s3 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
br i1 %c, label %if.then, label %if.else		br i1 %c, label %if.then, label %if.else

if.then:		if.then:
%r.0 = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> %b, <4 x float> %s0, <4 x float> %b)		%r.0 = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> %b, <4 x float> %s0, <4 x float> %b)
%r.1 = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> %r.0, <4 x float> %s1, <4 x float> %b)		%r.1 = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> %r.0, <4 x float> %s1, <4 x float> %b)
ret <4 x float> %r.1		ret <4 x float> %r.1

if.else:		if.else:
%r.2 = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> %b, <4 x float> %s2, <4 x float> %b)		%r.2 = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> %b, <4 x float> %s2, <4 x float> %b)
%r.3 = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> %r.2, <4 x float> %s3, <4 x float> %b)		%r.3 = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> %r.2, <4 x float> %s3, <4 x float> %b)
ret <4 x float> %r.3		ret <4 x float> %r.3
}		}

define <4 x float> @sink_shufflevector_first_arg_fma_v4f3(i1 %c, <8 x float> %a, <4 x float> %b) {		define <4 x float> @sink_shufflevector_first_arg_fma_v4f3(i1 %c, <8 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @sink_shufflevector_first_arg_fma_v4f3(		; CHECK-LABEL: @sink_shufflevector_first_arg_fma_v4f3(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[S0:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: [[S1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
; CHECK-NEXT: [[S2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 2>
; CHECK-NEXT: [[S3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]		; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[R_0:%.]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[S0]], <4 x float> [[B:%.]], <4 x float> [[B]])		; CHECK-NEXT: [[TMP0:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: [[R_1:%.*]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[S1]], <4 x float> [[R_0]], <4 x float> [[B]])		; CHECK-NEXT: [[R_0:%.]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[TMP0]], <4 x float> [[B:%.]], <4 x float> [[B]])
		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
		; CHECK-NEXT: [[R_1:%.*]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[TMP1]], <4 x float> [[R_0]], <4 x float> [[B]])
; CHECK-NEXT: ret <4 x float> [[R_1]]		; CHECK-NEXT: ret <4 x float> [[R_1]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: [[R_2:%.*]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[S2]], <4 x float> [[B]], <4 x float> [[B]])		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 2>
; CHECK-NEXT: [[R_3:%.*]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[S3]], <4 x float> [[R_2]], <4 x float> [[B]])		; CHECK-NEXT: [[R_2:%.*]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[TMP2]], <4 x float> [[B]], <4 x float> [[B]])
		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
		; CHECK-NEXT: [[R_3:%.*]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[TMP3]], <4 x float> [[R_2]], <4 x float> [[B]])
; CHECK-NEXT: ret <4 x float> [[R_3]]		; CHECK-NEXT: ret <4 x float> [[R_3]]
;		;
entry:		entry:
%s0 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> zeroinitializer		%s0 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> zeroinitializer
%s1 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> <i32 1, i32 1, i32 1, i32 1>		%s1 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
%s2 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 2>		%s2 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> <i32 2, i32 2, i32 2, i32 2>
%s3 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> <i32 3, i32 3, i32 3, i32 3>		%s3 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
br i1 %c, label %if.then, label %if.else		br i1 %c, label %if.then, label %if.else
Show All 11 Lines



declare <2 x double> @llvm.fma.v2f64(<2 x double>, <2 x double>, <2 x double>)		declare <2 x double> @llvm.fma.v2f64(<2 x double>, <2 x double>, <2 x double>)

define <2 x double> @sink_shufflevector_fma_v2f64(i1 %c, <2 x double> %a, <2 x double> %b) {		define <2 x double> @sink_shufflevector_fma_v2f64(i1 %c, <2 x double> %a, <2 x double> %b) {
; CHECK-LABEL: @sink_shufflevector_fma_v2f64(		; CHECK-LABEL: @sink_shufflevector_fma_v2f64(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[S0:%.]] = shufflevector <2 x double> [[A:%.]], <2 x double> poison, <2 x i32> zeroinitializer
; CHECK-NEXT: [[S1:%.*]] = shufflevector <2 x double> [[A]], <2 x double> poison, <2 x i32> <i32 1, i32 1>
; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]		; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[R_0:%.]] = tail call fast <2 x double> @llvm.fma.v2f64(<2 x double> [[B:%.]], <2 x double> [[S0]], <2 x double> [[B]])		; CHECK-NEXT: [[TMP0:%.]] = shufflevector <2 x double> [[A:%.]], <2 x double> poison, <2 x i32> zeroinitializer
		; CHECK-NEXT: [[R_0:%.]] = tail call fast <2 x double> @llvm.fma.v2f64(<2 x double> [[B:%.]], <2 x double> [[TMP0]], <2 x double> [[B]])
; CHECK-NEXT: ret <2 x double> [[R_0]]		; CHECK-NEXT: ret <2 x double> [[R_0]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: [[R_1:%.*]] = tail call fast <2 x double> @llvm.fma.v2f64(<2 x double> [[B]], <2 x double> [[S1]], <2 x double> [[B]])		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[A]], <2 x double> poison, <2 x i32> <i32 1, i32 1>
		; CHECK-NEXT: [[R_1:%.*]] = tail call fast <2 x double> @llvm.fma.v2f64(<2 x double> [[B]], <2 x double> [[TMP1]], <2 x double> [[B]])
; CHECK-NEXT: ret <2 x double> [[R_1]]		; CHECK-NEXT: ret <2 x double> [[R_1]]
;		;
entry:		entry:
%s0 = shufflevector <2 x double> %a, <2 x double> poison, <2 x i32> zeroinitializer		%s0 = shufflevector <2 x double> %a, <2 x double> poison, <2 x i32> zeroinitializer
%s1 = shufflevector <2 x double> %a, <2 x double> poison, <2 x i32> <i32 1, i32 1>		%s1 = shufflevector <2 x double> %a, <2 x double> poison, <2 x i32> <i32 1, i32 1>
br i1 %c, label %if.then, label %if.else		br i1 %c, label %if.then, label %if.else

if.then:		if.then:
%r.0 = tail call fast <2 x double> @llvm.fma.v2f64(<2 x double> %b, <2 x double> %s0, <2 x double> %b)		%r.0 = tail call fast <2 x double> @llvm.fma.v2f64(<2 x double> %b, <2 x double> %s0, <2 x double> %b)
ret <2 x double> %r.0		ret <2 x double> %r.0

if.else:		if.else:
%r.1 = tail call fast <2 x double> @llvm.fma.v2f64(<2 x double> %b, <2 x double> %s1, <2 x double> %b)		%r.1 = tail call fast <2 x double> @llvm.fma.v2f64(<2 x double> %b, <2 x double> %s1, <2 x double> %b)
ret <2 x double> %r.1		ret <2 x double> %r.1
}		}

define <4 x float> @do_not_sink_out_of_range_shufflevector_fma_v4f32(i1 %c, <8 x float> %a, <4 x float> %b) {		define <4 x float> @do_not_sink_out_of_range_shufflevector_fma_v4f32(i1 %c, <8 x float> %a, <4 x float> %b) {
; CHECK-LABEL: @do_not_sink_out_of_range_shufflevector_fma_v4f32(		; CHECK-LABEL: @do_not_sink_out_of_range_shufflevector_fma_v4f32(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[S4:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> poison, <4 x i32> <i32 4, i32 4, i32 4, i32 4>
; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]		; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[R:%.]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[B:%.]], <4 x float> [[S4]], <4 x float> [[B]])		; CHECK-NEXT: [[TMP0:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> poison, <4 x i32> <i32 4, i32 4, i32 4, i32 4>
		; CHECK-NEXT: [[R:%.]] = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> [[B:%.]], <4 x float> [[TMP0]], <4 x float> [[B]])
; CHECK-NEXT: ret <4 x float> [[R]]		; CHECK-NEXT: ret <4 x float> [[R]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: ret <4 x float> zeroinitializer		; CHECK-NEXT: ret <4 x float> zeroinitializer
;		;
entry:		entry:
%s4 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> <i32 4, i32 4, i32 4, i32 4>		%s4 = shufflevector <8 x float> %a, <8 x float> poison, <4 x i32> <i32 4, i32 4, i32 4, i32 4>
br i1 %c, label %if.then, label %if.else		br i1 %c, label %if.then, label %if.else

if.then:		if.then:
%r = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> %b, <4 x float> %s4, <4 x float> %b)		%r = tail call fast <4 x float> @llvm.fma.v4f32(<4 x float> %b, <4 x float> %s4, <4 x float> %b)
ret <4 x float> %r		ret <4 x float> %r

if.else:		if.else:
ret <4 x float> zeroinitializer		ret <4 x float> zeroinitializer
}		}

declare <5 x float> @llvm.fma.v5f32(<5 x float>, <5 x float>, <5 x float>)		declare <5 x float> @llvm.fma.v5f32(<5 x float>, <5 x float>, <5 x float>)

define <5 x float> @sink_shufflevector_fma_v5f32(i1 %c, <8 x float> %a, <5 x float> %b) {		define <5 x float> @sink_shufflevector_fma_v5f32(i1 %c, <8 x float> %a, <5 x float> %b) {
; CHECK-LABEL: @sink_shufflevector_fma_v5f32(		; CHECK-LABEL: @sink_shufflevector_fma_v5f32(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[S0:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> poison, <5 x i32> zeroinitializer		; CHECK-NEXT: [[S1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> poison, <5 x i32> <i32 1, i32 1, i32 1, i32 1, i32 4>
; CHECK-NEXT: [[S1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <5 x i32> <i32 1, i32 1, i32 1, i32 1, i32 4>
; CHECK-NEXT: [[S2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <5 x i32> <i32 2, i32 2, i32 2, i32 2, i32 4>		; CHECK-NEXT: [[S2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <5 x i32> <i32 2, i32 2, i32 2, i32 2, i32 4>
; CHECK-NEXT: [[S3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <5 x i32> <i32 3, i32 3, i32 3, i32 3, i32 4>		; CHECK-NEXT: [[S3:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <5 x i32> <i32 3, i32 3, i32 3, i32 3, i32 4>
; CHECK-NEXT: [[S4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <5 x i32> <i32 4, i32 4, i32 4, i32 4, i32 4>
; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]		; CHECK-NEXT: br i1 [[C:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
; CHECK: if.then:		; CHECK: if.then:
; CHECK-NEXT: [[R_0:%.]] = tail call fast <5 x float> @llvm.fma.v5f32(<5 x float> [[B:%.]], <5 x float> [[S0]], <5 x float> [[B]])		; CHECK-NEXT: [[TMP0:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <5 x i32> zeroinitializer
		; CHECK-NEXT: [[R_0:%.]] = tail call fast <5 x float> @llvm.fma.v5f32(<5 x float> [[B:%.]], <5 x float> [[TMP0]], <5 x float> [[B]])
; CHECK-NEXT: [[R_1:%.*]] = tail call fast <5 x float> @llvm.fma.v5f32(<5 x float> [[R_0]], <5 x float> [[S1]], <5 x float> [[B]])		; CHECK-NEXT: [[R_1:%.*]] = tail call fast <5 x float> @llvm.fma.v5f32(<5 x float> [[R_0]], <5 x float> [[S1]], <5 x float> [[B]])
; CHECK-NEXT: ret <5 x float> [[R_1]]		; CHECK-NEXT: ret <5 x float> [[R_1]]
; CHECK: if.else:		; CHECK: if.else:
; CHECK-NEXT: [[R_2:%.*]] = tail call fast <5 x float> @llvm.fma.v5f32(<5 x float> [[B]], <5 x float> [[S2]], <5 x float> [[B]])		; CHECK-NEXT: [[R_2:%.*]] = tail call fast <5 x float> @llvm.fma.v5f32(<5 x float> [[B]], <5 x float> [[S2]], <5 x float> [[B]])
; CHECK-NEXT: [[R_3:%.*]] = tail call fast <5 x float> @llvm.fma.v5f32(<5 x float> [[R_2]], <5 x float> [[S3]], <5 x float> [[B]])		; CHECK-NEXT: [[R_3:%.*]] = tail call fast <5 x float> @llvm.fma.v5f32(<5 x float> [[R_2]], <5 x float> [[S3]], <5 x float> [[B]])
; CHECK-NEXT: [[R_4:%.*]] = tail call fast <5 x float> @llvm.fma.v5f32(<5 x float> [[R_3]], <5 x float> [[S4]], <5 x float> [[B]])		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x float> [[A]], <8 x float> poison, <5 x i32> <i32 4, i32 4, i32 4, i32 4, i32 4>
		; CHECK-NEXT: [[R_4:%.*]] = tail call fast <5 x float> @llvm.fma.v5f32(<5 x float> [[R_3]], <5 x float> [[TMP1]], <5 x float> [[B]])
; CHECK-NEXT: ret <5 x float> [[R_4]]		; CHECK-NEXT: ret <5 x float> [[R_4]]
;		;
entry:		entry:
%s0 = shufflevector <8 x float> %a, <8 x float> poison, <5 x i32> zeroinitializer		%s0 = shufflevector <8 x float> %a, <8 x float> poison, <5 x i32> zeroinitializer
%s1 = shufflevector <8 x float> %a, <8 x float> poison, <5 x i32> <i32 1, i32 1, i32 1, i32 1, i32 4>		%s1 = shufflevector <8 x float> %a, <8 x float> poison, <5 x i32> <i32 1, i32 1, i32 1, i32 1, i32 4>
%s2 = shufflevector <8 x float> %a, <8 x float> poison, <5 x i32> <i32 2, i32 2, i32 2, i32 2, i32 4>		%s2 = shufflevector <8 x float> %a, <8 x float> poison, <5 x i32> <i32 2, i32 2, i32 2, i32 2, i32 4>
%s3 = shufflevector <8 x float> %a, <8 x float> poison, <5 x i32> <i32 3, i32 3, i32 3, i32 3, i32 4>		%s3 = shufflevector <8 x float> %a, <8 x float> poison, <5 x i32> <i32 3, i32 3, i32 3, i32 3, i32 4>
%s4 = shufflevector <8 x float> %a, <8 x float> poison, <5 x i32> <i32 4, i32 4, i32 4, i32 4, i32 4>		%s4 = shufflevector <8 x float> %a, <8 x float> poison, <5 x i32> <i32 4, i32 4, i32 4, i32 4, i32 4>
Show All 13 Lines