Download Raw Diff

Details

Reviewers

spatel
lebedev.ri
RKSimon
dmgreen

Commits

rGdf525c7705d8: [InstCombine] fold fake floating point vector extract to shift+trunc.

Summary

This patch supports the FP part of D111082.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jacquesguan created this revision.May 17 2022, 1:35 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 1:35 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

jacquesguan requested review of this revision.May 17 2022, 1:35 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 1:35 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B164819: Diff 429959.May 17 2022, 1:35 AM

Testing more float types might be a good idea (bfloat / float / double / x86_fp80 / f128 / ppc_f128), even of they're negative tests - not sure how much the isDesirableIntType check is going to interfere?

add more FP type test.

Harbormaster completed remote builds in B165275: Diff 430603.May 19 2022, 2:47 AM

Is there a motivating codegen example for this transform? Or some other IR transform that will fire as a result of this transform?

In the case with a shift, we have an extra IR instruction, so this would be a rare fold that increases instruction count. Maybe that's justifiable just because we want to keep the symmetry with the other patterns, but it would be better to show some kind of win from this patch.

llvm/test/Transforms/InstCombine/extractelement.ll
540	This comment doesn't look accurate. The data layout for both RUN lines says i64 is legal, so that's why we allow converting i64 in the test above. And there's no exception for a 1-element vector. We probably should have a test like this to confirm: define double @bitcast_fp64vec_index0(i64 %x) { %v = bitcast i64 %x to <1 x double> %r = extractelement <1 x double> %v, i8 0 ret double %r } Also make sure there's no infinite loop potential with that kind of pattern...I just noticed an odd behavior for it in D125951. Please pre-commit the baseline tests, so it's easier to see what is or is not changing. Add a RUN with a legal i128 to confirm this is behaving as expected?

add precommit test.

Harbormaster completed remote builds in B165498: Diff 430925.May 20 2022, 2:40 AM

jacquesguan added a parent revision: D126054: [InstCombine] Precommit test for D125750..May 22 2022, 6:47 PM

In D125750#3525445, @spatel wrote:

Is there a motivating codegen example for this transform? Or some other IR transform that will fire as a result of this transform?

In the case with a shift, we have an extra IR instruction, so this would be a rare fold that increases instruction count. Maybe that's justifiable just because we want to keep the symmetry with the other patterns, but it would be better to show some kind of win from this patch.

Mostly, it would be much cheaper if we use scalar shift + cast rather than bitcast + vector extractelemt, even the former might cause one more instruction in LLVMIR. For example in RISCV, the former one woule be lower to 3 scalar instructions, but the latter one would firstly move from GPR to vector register and then use 2 vector instruction to extract the element, it is truely much more expensive, even without counting the vector configuration instruction that should be insert for using vector instructions.

llvm/test/Transforms/InstCombine/extractelement.ll
540	Done.

In D125750#3533124, @jacquesguan wrote:

In D125750#3525445, @spatel wrote:

Is there a motivating codegen example for this transform? Or some other IR transform that will fire as a result of this transform?

In the case with a shift, we have an extra IR instruction, so this would be a rare fold that increases instruction count. Maybe that's justifiable just because we want to keep the symmetry with the other patterns, but it would be better to show some kind of win from this patch.

Mostly, it would be much cheaper if we use scalar shift + cast rather than bitcast + vector extractelemt, even the former might cause one more instruction in LLVMIR. For example in RISCV, the former one woule be lower to 3 scalar instructions, but the latter one would firstly move from GPR to vector register and then use 2 vector instruction to extract the element, it is truely much more expensive, even without counting the vector configuration instruction that should be insert for using vector instructions.

Yes, I understand the codegen motivation. I should have been more explicit though - that's generally not enough to justify an IR canonicalization if the backend could just as easily do this transform.

We really want to show that the change in IR leads to an improvement in analysis and/or results in even more optimization. Double-check, but we could probably add a test like this:
https://alive2.llvm.org/ce/z/77k-Zg

jacquesguan mentioned this in D126054: [InstCombine] Precommit test for D125750..Aug 25 2022, 11:58 PM

Address comment.

In D125750#3534196, @spatel wrote:

In D125750#3533124, @jacquesguan wrote:

In D125750#3525445, @spatel wrote:

Is there a motivating codegen example for this transform? Or some other IR transform that will fire as a result of this transform?

In the case with a shift, we have an extra IR instruction, so this would be a rare fold that increases instruction count. Maybe that's justifiable just because we want to keep the symmetry with the other patterns, but it would be better to show some kind of win from this patch.

Mostly, it would be much cheaper if we use scalar shift + cast rather than bitcast + vector extractelemt, even the former might cause one more instruction in LLVMIR. For example in RISCV, the former one woule be lower to 3 scalar instructions, but the latter one would firstly move from GPR to vector register and then use 2 vector instruction to extract the element, it is truely much more expensive, even without counting the vector configuration instruction that should be insert for using vector instructions.

Yes, I understand the codegen motivation. I should have been more explicit though - that's generally not enough to justify an IR canonicalization if the backend could just as easily do this transform.

We really want to show that the change in IR leads to an improvement in analysis and/or results in even more optimization. Double-check, but we could probably add a test like this:
https://alive2.llvm.org/ce/z/77k-Zg

Thanks, I added this test at the last.

Harbormaster completed remote builds in B183531: Diff 455812.Aug 26 2022, 12:55 AM

spatel added inline comments.Aug 26 2022, 11:03 AM

llvm/test/Transforms/InstCombine/extractelement.ll
757–758	We were missing a transform that would reduce this, so I added it here: 482777123427 Please rebase/update to generate new output.

Address comment.

llvm/test/Transforms/InstCombine/extractelement.ll
757–758	Done, thanks.

Harbormaster completed remote builds in B183863: Diff 456263.Aug 29 2022, 12:40 AM

LGTM

This revision is now accepted and ready to land.Aug 29 2022, 7:20 AM

jacquesguan mentioned this in rGf98153eac0ff: [InstCombine] Precommit test for D125750..Aug 29 2022, 6:57 PM

This revision was landed with ongoing or failed builds.Aug 29 2022, 7:12 PM

Closed by commit rGdf525c7705d8: [InstCombine] fold fake floating point vector extract to shift+trunc. (authored by jacquesguan). · Explain Why

This revision was automatically updated to reflect the committed changes.

jacquesguan added a commit: rGdf525c7705d8: [InstCombine] fold fake floating point vector extract to shift+trunc..

Diff 456521

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::foldBitcastExtElt(ExtractElementInst &Ext) {

ElementCount NumElts =		ElementCount NumElts =
cast<VectorType>(Ext.getVectorOperandType())->getElementCount();		cast<VectorType>(Ext.getVectorOperandType())->getElementCount();
Type *DestTy = Ext.getType();		Type *DestTy = Ext.getType();
bool IsBigEndian = DL.isBigEndian();		bool IsBigEndian = DL.isBigEndian();

// If we are casting an integer to vector and extracting a portion, that is		// If we are casting an integer to vector and extracting a portion, that is
// a shift-right and truncate.		// a shift-right and truncate.
// TODO: Allow FP dest type by casting the trunc to FP?		if (X->getType()->isIntegerTy() &&
if (X->getType()->isIntegerTy() && DestTy->isIntegerTy() &&
isDesirableIntType(X->getType()->getPrimitiveSizeInBits())) {		isDesirableIntType(X->getType()->getPrimitiveSizeInBits())) {
assert(isa<FixedVectorType>(Ext.getVectorOperand()->getType()) &&		assert(isa<FixedVectorType>(Ext.getVectorOperand()->getType()) &&
"Expected fixed vector type for bitcast from scalar integer");		"Expected fixed vector type for bitcast from scalar integer");

// Big endian requires adjusting the extract index since MSB is at index 0.		// Big endian requires adjusting the extract index since MSB is at index 0.
// LittleEndian: extelt (bitcast i32 X to v4i8), 0 -> trunc i32 X to i8		// LittleEndian: extelt (bitcast i32 X to v4i8), 0 -> trunc i32 X to i8
// BigEndian: extelt (bitcast i32 X to v4i8), 0 -> trunc i32 (X >> 24) to i8		// BigEndian: extelt (bitcast i32 X to v4i8), 0 -> trunc i32 (X >> 24) to i8
if (IsBigEndian)		if (IsBigEndian)
ExtIndexC = NumElts.getKnownMinValue() - 1 - ExtIndexC;		ExtIndexC = NumElts.getKnownMinValue() - 1 - ExtIndexC;
unsigned ShiftAmountC = ExtIndexC * DestTy->getPrimitiveSizeInBits();		unsigned ShiftAmountC = ExtIndexC * DestTy->getPrimitiveSizeInBits();
if (!ShiftAmountC \|\| Ext.getVectorOperand()->hasOneUse()) {		if (!ShiftAmountC \|\| Ext.getVectorOperand()->hasOneUse()) {
Value *Lshr = Builder.CreateLShr(X, ShiftAmountC, "extelt.offset");		Value *Lshr = Builder.CreateLShr(X, ShiftAmountC, "extelt.offset");
		if (DestTy->isFloatingPointTy()) {
		Type *DstIntTy = IntegerType::getIntNTy(
		Lshr->getContext(), DestTy->getPrimitiveSizeInBits());
		Value *Trunc = Builder.CreateTrunc(Lshr, DstIntTy);
		return new BitCastInst(Trunc, DestTy);
		}
return new TruncInst(Lshr, DestTy);		return new TruncInst(Lshr, DestTy);
}		}
}		}

if (!X->getType()->isVectorTy())		if (!X->getType()->isVectorTy())
return nullptr;		return nullptr;

// If this extractelement is using a bitcast from a vector of the same number		// If this extractelement is using a bitcast from a vector of the same number
▲ Show 20 Lines • Show All 2,710 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/extractelement.ll

	Show First 20 Lines • Show All 430 Lines • ▼ Show 20 Lines
	; ANY-NEXT: ret i8 [[R]]			; ANY-NEXT: ret i8 [[R]]
	;			;
	%v = bitcast float %x to <4 x i8>			%v = bitcast float %x to <4 x i8>
	%r = extractelement <4 x i8> %v, i8 0			%r = extractelement <4 x i8> %v, i8 0
	ret i8 %r			ret i8 %r
	}			}

	define half @bitcast_fp16vec_index0(i32 %x) {			define half @bitcast_fp16vec_index0(i32 %x) {
	; ANY-LABEL: @bitcast_fp16vec_index0(			; ANYLE-LABEL: @bitcast_fp16vec_index0(
	; ANY-NEXT: [[V:%.]] = bitcast i32 [[X:%.]] to <2 x half>			; ANYLE-NEXT: [[TMP1:%.]] = trunc i32 [[X:%.]] to i16
	; ANY-NEXT: [[R:%.*]] = extractelement <2 x half> [[V]], i64 0			; ANYLE-NEXT: [[R:%.*]] = bitcast i16 [[TMP1]] to half
	; ANY-NEXT: ret half [[R]]			; ANYLE-NEXT: ret half [[R]]
				;
				; ANYBE-LABEL: @bitcast_fp16vec_index0(
				; ANYBE-NEXT: [[EXTELT_OFFSET:%.]] = lshr i32 [[X:%.]], 16
				; ANYBE-NEXT: [[TMP1:%.*]] = trunc i32 [[EXTELT_OFFSET]] to i16
				; ANYBE-NEXT: [[R:%.*]] = bitcast i16 [[TMP1]] to half
				; ANYBE-NEXT: ret half [[R]]
	;			;
	%v = bitcast i32 %x to <2 x half>			%v = bitcast i32 %x to <2 x half>
	%r = extractelement <2 x half> %v, i8 0			%r = extractelement <2 x half> %v, i8 0
	ret half %r			ret half %r
	}			}

	define half @bitcast_fp16vec_index1(i32 %x) {			define half @bitcast_fp16vec_index1(i32 %x) {
	; ANY-LABEL: @bitcast_fp16vec_index1(			; ANYLE-LABEL: @bitcast_fp16vec_index1(
	; ANY-NEXT: [[V:%.]] = bitcast i32 [[X:%.]] to <2 x half>			; ANYLE-NEXT: [[EXTELT_OFFSET:%.]] = lshr i32 [[X:%.]], 16
	; ANY-NEXT: [[R:%.*]] = extractelement <2 x half> [[V]], i64 1			; ANYLE-NEXT: [[TMP1:%.*]] = trunc i32 [[EXTELT_OFFSET]] to i16
	; ANY-NEXT: ret half [[R]]			; ANYLE-NEXT: [[R:%.*]] = bitcast i16 [[TMP1]] to half
				; ANYLE-NEXT: ret half [[R]]
				;
				; ANYBE-LABEL: @bitcast_fp16vec_index1(
				; ANYBE-NEXT: [[TMP1:%.]] = trunc i32 [[X:%.]] to i16
				; ANYBE-NEXT: [[R:%.*]] = bitcast i16 [[TMP1]] to half
				; ANYBE-NEXT: ret half [[R]]
	;			;
	%v = bitcast i32 %x to <2 x half>			%v = bitcast i32 %x to <2 x half>
	%r = extractelement <2 x half> %v, i8 1			%r = extractelement <2 x half> %v, i8 1
	ret half %r			ret half %r
	}			}

	define bfloat @bitcast_bfp16vec_index0(i32 %x) {			define bfloat @bitcast_bfp16vec_index0(i32 %x) {
	; ANY-LABEL: @bitcast_bfp16vec_index0(			; ANYLE-LABEL: @bitcast_bfp16vec_index0(
	; ANY-NEXT: [[V:%.]] = bitcast i32 [[X:%.]] to <2 x bfloat>			; ANYLE-NEXT: [[TMP1:%.]] = trunc i32 [[X:%.]] to i16
	; ANY-NEXT: [[R:%.*]] = extractelement <2 x bfloat> [[V]], i64 0			; ANYLE-NEXT: [[R:%.*]] = bitcast i16 [[TMP1]] to bfloat
	; ANY-NEXT: ret bfloat [[R]]			; ANYLE-NEXT: ret bfloat [[R]]
				;
				; ANYBE-LABEL: @bitcast_bfp16vec_index0(
				; ANYBE-NEXT: [[EXTELT_OFFSET:%.]] = lshr i32 [[X:%.]], 16
				; ANYBE-NEXT: [[TMP1:%.*]] = trunc i32 [[EXTELT_OFFSET]] to i16
				; ANYBE-NEXT: [[R:%.*]] = bitcast i16 [[TMP1]] to bfloat
				; ANYBE-NEXT: ret bfloat [[R]]
	;			;
	%v = bitcast i32 %x to <2 x bfloat>			%v = bitcast i32 %x to <2 x bfloat>
	%r = extractelement <2 x bfloat> %v, i8 0			%r = extractelement <2 x bfloat> %v, i8 0
	ret bfloat %r			ret bfloat %r
	}			}

	define bfloat @bitcast_bfp16vec_index1(i32 %x) {			define bfloat @bitcast_bfp16vec_index1(i32 %x) {
	; ANY-LABEL: @bitcast_bfp16vec_index1(			; ANYLE-LABEL: @bitcast_bfp16vec_index1(
	; ANY-NEXT: [[V:%.]] = bitcast i32 [[X:%.]] to <2 x bfloat>			; ANYLE-NEXT: [[EXTELT_OFFSET:%.]] = lshr i32 [[X:%.]], 16
	; ANY-NEXT: [[R:%.*]] = extractelement <2 x bfloat> [[V]], i64 1			; ANYLE-NEXT: [[TMP1:%.*]] = trunc i32 [[EXTELT_OFFSET]] to i16
	; ANY-NEXT: ret bfloat [[R]]			; ANYLE-NEXT: [[R:%.*]] = bitcast i16 [[TMP1]] to bfloat
				; ANYLE-NEXT: ret bfloat [[R]]
				;
				; ANYBE-LABEL: @bitcast_bfp16vec_index1(
				; ANYBE-NEXT: [[TMP1:%.]] = trunc i32 [[X:%.]] to i16
				; ANYBE-NEXT: [[R:%.*]] = bitcast i16 [[TMP1]] to bfloat
				; ANYBE-NEXT: ret bfloat [[R]]
	;			;
	%v = bitcast i32 %x to <2 x bfloat>			%v = bitcast i32 %x to <2 x bfloat>
	%r = extractelement <2 x bfloat> %v, i8 1			%r = extractelement <2 x bfloat> %v, i8 1
	ret bfloat %r			ret bfloat %r
	}			}

	define float @bitcast_fp32vec_index0(i64 %x) {			define float @bitcast_fp32vec_index0(i64 %x) {
	; ANY-LABEL: @bitcast_fp32vec_index0(			; LE64-LABEL: @bitcast_fp32vec_index0(
	; ANY-NEXT: [[V:%.]] = bitcast i64 [[X:%.]] to <2 x float>			; LE64-NEXT: [[TMP1:%.]] = trunc i64 [[X:%.]] to i32
	; ANY-NEXT: [[R:%.*]] = extractelement <2 x float> [[V]], i64 0			; LE64-NEXT: [[R:%.*]] = bitcast i32 [[TMP1]] to float
	; ANY-NEXT: ret float [[R]]			; LE64-NEXT: ret float [[R]]
				;
				; LE128-LABEL: @bitcast_fp32vec_index0(
				; LE128-NEXT: [[V:%.]] = bitcast i64 [[X:%.]] to <2 x float>
				; LE128-NEXT: [[R:%.*]] = extractelement <2 x float> [[V]], i64 0
				; LE128-NEXT: ret float [[R]]
				;
				; BE64-LABEL: @bitcast_fp32vec_index0(
				; BE64-NEXT: [[EXTELT_OFFSET:%.]] = lshr i64 [[X:%.]], 32
				; BE64-NEXT: [[TMP1:%.*]] = trunc i64 [[EXTELT_OFFSET]] to i32
				; BE64-NEXT: [[R:%.*]] = bitcast i32 [[TMP1]] to float
				; BE64-NEXT: ret float [[R]]
				;
				; BE128-LABEL: @bitcast_fp32vec_index0(
				; BE128-NEXT: [[V:%.]] = bitcast i64 [[X:%.]] to <2 x float>
				; BE128-NEXT: [[R:%.*]] = extractelement <2 x float> [[V]], i64 0
				; BE128-NEXT: ret float [[R]]
	;			;
	%v = bitcast i64 %x to <2 x float>			%v = bitcast i64 %x to <2 x float>
	%r = extractelement <2 x float> %v, i8 0			%r = extractelement <2 x float> %v, i8 0
	ret float %r			ret float %r
	}			}

	define float @bitcast_fp32vec_index1(i64 %x) {			define float @bitcast_fp32vec_index1(i64 %x) {
	; ANY-LABEL: @bitcast_fp32vec_index1(			; LE64-LABEL: @bitcast_fp32vec_index1(
	; ANY-NEXT: [[V:%.]] = bitcast i64 [[X:%.]] to <2 x float>			; LE64-NEXT: [[EXTELT_OFFSET:%.]] = lshr i64 [[X:%.]], 32
	; ANY-NEXT: [[R:%.*]] = extractelement <2 x float> [[V]], i64 1			; LE64-NEXT: [[TMP1:%.*]] = trunc i64 [[EXTELT_OFFSET]] to i32
	; ANY-NEXT: ret float [[R]]			; LE64-NEXT: [[R:%.*]] = bitcast i32 [[TMP1]] to float
				; LE64-NEXT: ret float [[R]]
				;
				; LE128-LABEL: @bitcast_fp32vec_index1(
				spatelUnsubmitted Not Done Reply Inline Actions This comment doesn't look accurate. The data layout for both RUN lines says i64 is legal, so that's why we allow converting i64 in the test above. And there's no exception for a 1-element vector. We probably should have a test like this to confirm: define double @bitcast_fp64vec_index0(i64 %x) { %v = bitcast i64 %x to <1 x double> %r = extractelement <1 x double> %v, i8 0 ret double %r } Also make sure there's no infinite loop potential with that kind of pattern...I just noticed an odd behavior for it in D125951. Please pre-commit the baseline tests, so it's easier to see what is or is not changing. Add a RUN with a legal i128 to confirm this is behaving as expected? spatel: This comment doesn't look accurate. The data layout for both RUN lines says i64 is legal, so…
				jacquesguanAuthorUnsubmitted Done Reply Inline Actions Done. jacquesguan: Done.
				; LE128-NEXT: [[V:%.]] = bitcast i64 [[X:%.]] to <2 x float>
				; LE128-NEXT: [[R:%.*]] = extractelement <2 x float> [[V]], i64 1
				; LE128-NEXT: ret float [[R]]
				;
				; BE64-LABEL: @bitcast_fp32vec_index1(
				; BE64-NEXT: [[TMP1:%.]] = trunc i64 [[X:%.]] to i32
				; BE64-NEXT: [[R:%.*]] = bitcast i32 [[TMP1]] to float
				; BE64-NEXT: ret float [[R]]
				;
				; BE128-LABEL: @bitcast_fp32vec_index1(
				; BE128-NEXT: [[V:%.]] = bitcast i64 [[X:%.]] to <2 x float>
				; BE128-NEXT: [[R:%.*]] = extractelement <2 x float> [[V]], i64 1
				; BE128-NEXT: ret float [[R]]
	;			;
	%v = bitcast i64 %x to <2 x float>			%v = bitcast i64 %x to <2 x float>
	%r = extractelement <2 x float> %v, i8 1			%r = extractelement <2 x float> %v, i8 1
	ret float %r			ret float %r
	}			}

	define double @bitcast_fp64vec64_index0(i64 %x) {			define double @bitcast_fp64vec64_index0(i64 %x) {
	; ANY-LABEL: @bitcast_fp64vec64_index0(			; LE64-LABEL: @bitcast_fp64vec64_index0(
	; ANY-NEXT: [[V:%.]] = bitcast i64 [[X:%.]] to <1 x double>			; LE64-NEXT: [[R:%.]] = bitcast i64 [[X:%.]] to double
	; ANY-NEXT: [[R:%.*]] = extractelement <1 x double> [[V]], i64 0			; LE64-NEXT: ret double [[R]]
	; ANY-NEXT: ret double [[R]]			;
				; LE128-LABEL: @bitcast_fp64vec64_index0(
				; LE128-NEXT: [[V:%.]] = bitcast i64 [[X:%.]] to <1 x double>
				; LE128-NEXT: [[R:%.*]] = extractelement <1 x double> [[V]], i64 0
				; LE128-NEXT: ret double [[R]]
				;
				; BE64-LABEL: @bitcast_fp64vec64_index0(
				; BE64-NEXT: [[R:%.]] = bitcast i64 [[X:%.]] to double
				; BE64-NEXT: ret double [[R]]
				;
				; BE128-LABEL: @bitcast_fp64vec64_index0(
				; BE128-NEXT: [[V:%.]] = bitcast i64 [[X:%.]] to <1 x double>
				; BE128-NEXT: [[R:%.*]] = extractelement <1 x double> [[V]], i64 0
				; BE128-NEXT: ret double [[R]]
	;			;
	%v = bitcast i64 %x to <1 x double>			%v = bitcast i64 %x to <1 x double>
	%r = extractelement <1 x double> %v, i8 0			%r = extractelement <1 x double> %v, i8 0
	ret double %r			ret double %r
	}			}

	define double @bitcast_fp64vec_index0(i128 %x) {			define double @bitcast_fp64vec_index0(i128 %x) {
	; ANY-LABEL: @bitcast_fp64vec_index0(			; LE64-LABEL: @bitcast_fp64vec_index0(
	; ANY-NEXT: [[V:%.]] = bitcast i128 [[X:%.]] to <2 x double>			; LE64-NEXT: [[V:%.]] = bitcast i128 [[X:%.]] to <2 x double>
	; ANY-NEXT: [[R:%.*]] = extractelement <2 x double> [[V]], i64 0			; LE64-NEXT: [[R:%.*]] = extractelement <2 x double> [[V]], i64 0
	; ANY-NEXT: ret double [[R]]			; LE64-NEXT: ret double [[R]]
				;
				; LE128-LABEL: @bitcast_fp64vec_index0(
				; LE128-NEXT: [[TMP1:%.]] = trunc i128 [[X:%.]] to i64
				; LE128-NEXT: [[R:%.*]] = bitcast i64 [[TMP1]] to double
				; LE128-NEXT: ret double [[R]]
				;
				; BE64-LABEL: @bitcast_fp64vec_index0(
				; BE64-NEXT: [[V:%.]] = bitcast i128 [[X:%.]] to <2 x double>
				; BE64-NEXT: [[R:%.*]] = extractelement <2 x double> [[V]], i64 0
				; BE64-NEXT: ret double [[R]]
				;
				; BE128-LABEL: @bitcast_fp64vec_index0(
				; BE128-NEXT: [[EXTELT_OFFSET:%.]] = lshr i128 [[X:%.]], 64
				; BE128-NEXT: [[TMP1:%.*]] = trunc i128 [[EXTELT_OFFSET]] to i64
				; BE128-NEXT: [[R:%.*]] = bitcast i64 [[TMP1]] to double
				; BE128-NEXT: ret double [[R]]
	;			;
	%v = bitcast i128 %x to <2 x double>			%v = bitcast i128 %x to <2 x double>
	%r = extractelement <2 x double> %v, i8 0			%r = extractelement <2 x double> %v, i8 0
	ret double %r			ret double %r
	}			}

	define double @bitcast_fp64vec_index1(i128 %x) {			define double @bitcast_fp64vec_index1(i128 %x) {
	; ANY-LABEL: @bitcast_fp64vec_index1(			; LE64-LABEL: @bitcast_fp64vec_index1(
	; ANY-NEXT: [[V:%.]] = bitcast i128 [[X:%.]] to <2 x double>			; LE64-NEXT: [[V:%.]] = bitcast i128 [[X:%.]] to <2 x double>
	; ANY-NEXT: [[R:%.*]] = extractelement <2 x double> [[V]], i64 1			; LE64-NEXT: [[R:%.*]] = extractelement <2 x double> [[V]], i64 1
	; ANY-NEXT: ret double [[R]]			; LE64-NEXT: ret double [[R]]
				;
				; LE128-LABEL: @bitcast_fp64vec_index1(
				; LE128-NEXT: [[EXTELT_OFFSET:%.]] = lshr i128 [[X:%.]], 64
				; LE128-NEXT: [[TMP1:%.*]] = trunc i128 [[EXTELT_OFFSET]] to i64
				; LE128-NEXT: [[R:%.*]] = bitcast i64 [[TMP1]] to double
				; LE128-NEXT: ret double [[R]]
				;
				; BE64-LABEL: @bitcast_fp64vec_index1(
				; BE64-NEXT: [[V:%.]] = bitcast i128 [[X:%.]] to <2 x double>
				; BE64-NEXT: [[R:%.*]] = extractelement <2 x double> [[V]], i64 1
				; BE64-NEXT: ret double [[R]]
				;
				; BE128-LABEL: @bitcast_fp64vec_index1(
				; BE128-NEXT: [[TMP1:%.]] = trunc i128 [[X:%.]] to i64
				; BE128-NEXT: [[R:%.*]] = bitcast i64 [[TMP1]] to double
				; BE128-NEXT: ret double [[R]]
	;			;
	%v = bitcast i128 %x to <2 x double>			%v = bitcast i128 %x to <2 x double>
	%r = extractelement <2 x double> %v, i8 1			%r = extractelement <2 x double> %v, i8 1
	ret double %r			ret double %r
	}			}

	; negative test - input integer should be legal			; negative test - input integer should be legal

	▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
	;			;

	%v = bitcast i64 %x to <8 x i8>			%v = bitcast i64 %x to <8 x i8>
	call void @use(<8 x i8> %v)			call void @use(<8 x i8> %v)
	%r = extractelement <8 x i8> %v, i64 0			%r = extractelement <8 x i8> %v, i64 0
	ret i8 %r			ret i8 %r
	}			}

	define i1 @bit_extract_cmp(i64 %x) {			define i1 @bit_extract_cmp(i64 %x) {
	; ANY-LABEL: @bit_extract_cmp(			; LE64-LABEL: @bit_extract_cmp(
				spatelUnsubmitted Not Done Reply Inline Actions We were missing a transform that would reduce this, so I added it here: 482777123427 Please rebase/update to generate new output. spatel: We were missing a transform that would reduce this, so I added it here: 482777123427 Please…
				jacquesguanAuthorUnsubmitted Done Reply Inline Actions Done, thanks. jacquesguan: Done, thanks.
	; ANY-NEXT: [[V:%.]] = bitcast i64 [[X:%.]] to <2 x float>			; LE64-NEXT: [[TMP1:%.]] = and i64 [[X:%.]], 9223372032559808512
	; ANY-NEXT: [[E:%.*]] = extractelement <2 x float> [[V]], i64 1			; LE64-NEXT: [[R:%.*]] = icmp eq i64 [[TMP1]], 0
	; ANY-NEXT: [[R:%.*]] = fcmp oeq float [[E]], 0.000000e+00			; LE64-NEXT: ret i1 [[R]]
	; ANY-NEXT: ret i1 [[R]]			;
				; LE128-LABEL: @bit_extract_cmp(
				; LE128-NEXT: [[V:%.]] = bitcast i64 [[X:%.]] to <2 x float>
				; LE128-NEXT: [[E:%.*]] = extractelement <2 x float> [[V]], i64 1
				; LE128-NEXT: [[R:%.*]] = fcmp oeq float [[E]], 0.000000e+00
				; LE128-NEXT: ret i1 [[R]]
				;
				; BE64-LABEL: @bit_extract_cmp(
				; BE64-NEXT: [[TMP1:%.]] = and i64 [[X:%.]], 2147483647
				; BE64-NEXT: [[R:%.*]] = icmp eq i64 [[TMP1]], 0
				; BE64-NEXT: ret i1 [[R]]
				;
				; BE128-LABEL: @bit_extract_cmp(
				; BE128-NEXT: [[V:%.]] = bitcast i64 [[X:%.]] to <2 x float>
				; BE128-NEXT: [[E:%.*]] = extractelement <2 x float> [[V]], i64 1
				; BE128-NEXT: [[R:%.*]] = fcmp oeq float [[E]], 0.000000e+00
				; BE128-NEXT: ret i1 [[R]]
	;			;
	%v = bitcast i64 %x to <2 x float>			%v = bitcast i64 %x to <2 x float>
	%e = extractelement <2 x float> %v, i8 1			%e = extractelement <2 x float> %v, i8 1
	%r = fcmp oeq float %e, 0.0			%r = fcmp oeq float %e, 0.0
	ret i1 %r			ret i1 %r
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] fold fake floating point vector extract to shift+trunc.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 456521

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

llvm/test/Transforms/InstCombine/extractelement.ll

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] fold fake floating point vector extract to shift+trunc.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 456521

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

llvm/test/Transforms/InstCombine/extractelement.ll

[InstCombine] fold fake floating point vector extract to shift+trunc.
ClosedPublic