Download Raw Diff

Details

Reviewers

foad
spatel
sdesmalen
david-arm

Commits

rG3c1f0e9ef89f: [InstSimplify] Add constant fold for extractelement + splat for scalable vectors

Summary

This patch allows that scalable vector can fold extractelement and splat of a constant
only when the lane index is lower than the minimum number of elements of the vector.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

CarolineConcatto created this revision.May 26 2021, 9:22 AM

Herald added subscribers: dexonsmith, hiraditya. · View Herald TranscriptMay 26 2021, 9:22 AM

CarolineConcatto requested review of this revision.May 26 2021, 9:22 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 26 2021, 9:22 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Matt added a subscriber: Matt.May 26 2021, 9:28 AM

CarolineConcatto edited the summary of this revision. (Show Details)May 26 2021, 9:29 AM

CarolineConcatto added reviewers: foad, spatel, sdesmalen, david-arm.

CarolineConcatto mentioned this in D102404: [InstCombine] Add instcombine fold for extractelement + splat for scalable vectors.May 26 2021, 9:39 AM

The implementation in this patch was previously in:
https://reviews.llvm.org/D101916.

Harbormaster completed remote builds in B106315: Diff 347994.May 26 2021, 9:41 AM

Thanks for splitting it up. I like small patches.

llvm/lib/IR/ConstantFold.cpp
910–911	You should be able to remove the two lines above handling the CAZ case, because your new code will handle it.

In D103180#2782861, @foad wrote:

Thanks for splitting it up. I like small patches.

Definitely agree. :)

For the test where the index is presumed invalid (i32 -1), is that enforced somehow? Is there a hard limit somewhere that says i32 0xffffffff must be invalid?

Hi @foad,
I don't know if I understood correct your suggestion.
I did not change any line.
I have tried some combinations of IF removals, but they all broke some tests

llvm/lib/IR/ConstantFold.cpp
910–911	a
910–911	Hi @foad I believe we should keep these 2 tests: !CIdx->uge(ValSVTy->getMinNumElements()) because it limits the index to be lower than the minimum width of the scalable vector. and it should be only for scalable vector because of this return CAZ->getElementValue(CIdx->getZExtValue()); transformation. If we change to be vector type many other tests fail. For instance this one: CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll Transforms/LoopVectorize/X86/gather_scatter.ll

sdesmalen added inline comments.Jun 1 2021, 5:37 AM

llvm/lib/IR/ConstantFold.cpp
910–911	Hi @CarolineConcatto, I think @foad meant removing the lines: if (auto *CAZ = dyn_cast<ConstantAggregateZero>(Val)) return CAZ->getElementValue(CIdx->getZExtValue()); If `Val` is a constant aggregate zero (<=> all zeroes), then this is a splat value that's recognized by `getSplatValue`, so your new lines added in this patch will handle the same case making the above lines redundant.

Remove vector of zeros check by a splat vector check

Hi @foad,
Sorry for my miss interpretation.
I believe now it is as you suggested.

llvm/lib/IR/ConstantFold.cpp
910–911	Thank you @sdesmalen, that makes sense. Change made.

Harbormaster completed remote builds in B107025: Diff 348954.Jun 1 2021, 7:23 AM

In D103180#2782981, @spatel wrote:

In D103180#2782861, @foad wrote:

Thanks for splitting it up. I like small patches.

Definitely agree. :)

For the test where the index is presumed invalid (i32 -1), is that enforced somehow? Is there a hard limit somewhere that says i32 0xffffffff must be invalid?

Sorry, @spatel
I've missed your comment above
I think the reason it says is invalid is because of the test:
uge(ValSVTy->getMinNumElements() -> unsigned greater or equal

In D103180#2793718, @CarolineConcatto wrote:

In D103180#2782981, @spatel wrote:

For the test where the index is presumed invalid (i32 -1), is that enforced somehow? Is there a hard limit somewhere that says i32 0xffffffff must be invalid?

I think the reason it says is invalid is because of the test:
uge(ValSVTy->getMinNumElements() -> unsigned greater or equal

Hmm...but those are uint64_t types/compares, so I don't see how it would automatically be considered invalid to have a "i32 -1". I don't think it really matters to this patch. I was just curious if there's some implicit semantic difference being verified between the tests "extractconstant_shuffle_maybe_out_of_range" and "extractconstant_shuffle_invalid_index".

So no objection from me, but someone more directly involved with scalable vectors should probably give final approval.

The patch looks fine to me with the nit addressed.

llvm/lib/IR/ConstantFold.cpp
909–910	nit: `s/!CIdx->uge/CIdx->ult/`

This revision is now accepted and ready to land.Jun 3 2021, 12:41 AM

CarolineConcatto added inline comments.Jun 3 2021, 1:23 AM

llvm/lib/IR/ConstantFold.cpp
909–910	Hey @sdesmalen I may be completely wrong, but I believe your suggestion is not possible https://llvm.org/doxygen/classllvm_1_1ConstantInt.html If I looked the documentation correctly llvm::constant only has UGE

sdesmalen added inline comments.Jun 3 2021, 1:46 AM

llvm/lib/IR/ConstantFold.cpp
909–910	Ah I hadn't realised that. In that case, could you write it more explicitly as CIdx->getZExtValue() < ValSVTy->getMinNumElements()

dmgreen added a subscriber: dmgreen.Jun 3 2021, 1:58 AM

dmgreen added inline comments.

llvm/lib/IR/ConstantFold.cpp
909–910	Do we know the type of CIdx is always less that 64bits? uge() is just a convenience wrapper around APInt::uge(). You can use CIdx->getValue().ult(..) if you feel the ult is better than the !uge

sdesmalen added inline comments.Jun 3 2021, 2:00 AM

llvm/lib/IR/ConstantFold.cpp
909–910	Agreed, that's better, my main concern was the `!uge`.

replace the use of !CIdx->ult() by !CIdx->getValue().uge(..)

CarolineConcatto marked 2 inline comments as done.Jun 3 2021, 3:09 AM

Harbormaster completed remote builds in B107423: Diff 349507.Jun 3 2021, 3:59 AM

The patch looks OK to me. I added some inline comments but you don't need to do anything about them, they are just observations.

llvm/lib/IR/ConstantFold.cpp
909–910	Or you could implement ConstantInt::ult.
909–910	I don't really understand why this special case for scalable types exists here. Why not just fall through to the call to `Val->getAggregateElement` below? It looks like getAggregateElement already has some support for scalable types.

update test vector_splat_indices_nxv2i64_ext0 in gep-vector-indices.ll

Harbormaster completed remote builds in B108192: Diff 350577.Jun 8 2021, 5:11 AM

Matt added inline comments.Jun 8 2021, 8:38 AM

llvm/test/Transforms/InstSimplify/ConstProp/extractelement-vscale.ll
15	(Tiny nit:) Perhaps "extracting" would fit a little bit better here?

s/extract/extracting in test commemts

CarolineConcatto marked an inline comment as done.Jun 9 2021, 12:23 AM

Harbormaster completed remote builds in B108349: Diff 350798.Jun 9 2021, 1:13 AM

remove scalable vector check because getAggregateElement has safeguards for scalable vectors

Harbormaster completed remote builds in B108380: Diff 350842.Jun 9 2021, 4:19 AM

Closed by commit rG3c1f0e9ef89f: [InstSimplify] Add constant fold for extractelement + splat for scalable vectors (authored by CarolineConcatto). · Explain WhyJun 10 2021, 4:42 AM

This revision was automatically updated to reflect the committed changes.

CarolineConcatto added a commit: rG3c1f0e9ef89f: [InstSimplify] Add constant fold for extractelement + splat for scalable vectors.

Diff 351132

llvm/lib/IR/ConstantFold.cpp

Show First 20 Lines • Show All 900 Lines • ▼ Show 20 Lines	if (auto *GEP = dyn_cast<GEPOperator>(CE)) {
APSInt(CIdx->getValue()))) {		APSInt(CIdx->getValue()))) {
return CE->getOperand(1);		return CE->getOperand(1);
} else {		} else {
return ConstantExpr::getExtractElement(CE->getOperand(0), CIdx);		return ConstantExpr::getExtractElement(CE->getOperand(0), CIdx);
}		}
}		}
}		}
}		}

// CAZ of type ScalableVectorType and n < CAZ->getMinNumElements() =>		// Lane < Splat minimum vector width => extractelt Splat(x), Lane -> x
		sdesmalenUnsubmitted Not Done Reply Inline Actions nit: `s/!CIdx->uge/CIdx->ult/` sdesmalen: nit: `s/!CIdx->uge/CIdx->ult/`
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Hey @sdesmalen I may be completely wrong, but I believe your suggestion is not possible https://llvm.org/doxygen/classllvm_1_1ConstantInt.html If I looked the documentation correctly llvm::constant only has UGE CarolineConcatto: Hey @sdesmalen I may be completely wrong, but I believe your suggestion is not possible https…
		sdesmalenUnsubmitted Not Done Reply Inline Actions Ah I hadn't realised that. In that case, could you write it more explicitly as CIdx->getZExtValue() < ValSVTy->getMinNumElements() sdesmalen: Ah I hadn't realised that. In that case, could you write it more explicitly as CIdx…
		dmgreenUnsubmitted Done Reply Inline Actions Do we know the type of CIdx is always less that 64bits? uge() is just a convenience wrapper around APInt::uge(). You can use CIdx->getValue().ult(..) if you feel the ult is better than the !uge dmgreen: Do we know the type of CIdx is always less that 64bits? uge() is just a convenience wrapper…
		sdesmalenUnsubmitted Done Reply Inline Actions Agreed, that's better, my main concern was the `!uge`. sdesmalen: Agreed, that's better, my main concern was the `!uge`.
		foadUnsubmitted Not Done Reply Inline Actions Or you could implement ConstantInt::ult. foad: Or you could implement ConstantInt::ult.
		foadUnsubmitted Not Done Reply Inline Actions I don't really understand why this special case for scalable types exists here. Why not just fall through to the call to `Val->getAggregateElement` below? It looks like getAggregateElement already has some support for scalable types. foad: I don't really understand why this special case for scalable types exists here. Why not just…
// extractelt CAZ, n -> 0		if (CIdx->getValue().ult(ValVTy->getElementCount().getKnownMinValue())) {
		foadUnsubmitted Not Done Reply Inline Actions You should be able to remove the two lines above handling the CAZ case, because your new code will handle it. foad: You should be able to remove the two lines above handling the CAZ case, because your new code…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions a CarolineConcatto: a
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Hi @foad I believe we should keep these 2 tests: !CIdx->uge(ValSVTy->getMinNumElements()) because it limits the index to be lower than the minimum width of the scalable vector. and it should be only for scalable vector because of this return CAZ->getElementValue(CIdx->getZExtValue()); transformation. If we change to be vector type many other tests fail. For instance this one: CodeGen/AMDGPU/amdgpu-codegenprepare-idiv.ll Transforms/LoopVectorize/X86/gather_scatter.ll CarolineConcatto: Hi @foad I believe we should keep these 2 tests: !CIdx->uge(ValSVTy->getMinNumElements())…
		sdesmalenUnsubmitted Done Reply Inline Actions Hi @CarolineConcatto, I think @foad meant removing the lines: if (auto CAZ = dyn_cast<ConstantAggregateZero>(Val)) return CAZ->getElementValue(CIdx->getZExtValue()); If `Val` is a constant aggregate zero (<=> all zeroes), then this is a splat value that's recognized by `getSplatValue`, so your new lines added in this patch will handle the same case making the above lines redundant. sdesmalen:* Hi @CarolineConcatto, I think @foad meant removing the lines: if (auto *CAZ =…
		CarolineConcattoAuthorUnsubmitted Done Reply Inline Actions Thank you @sdesmalen, that makes sense. Change made. CarolineConcatto: Thank you @sdesmalen, that makes sense. Change made.
if (auto *ValSVTy = dyn_cast<ScalableVectorType>(Val->getType())) {		if (Constant *SplatVal = Val->getSplatValue())
if (!CIdx->uge(ValSVTy->getMinNumElements())) {		return SplatVal;
if (auto *CAZ = dyn_cast<ConstantAggregateZero>(Val))
return CAZ->getElementValue(CIdx->getZExtValue());
}
return nullptr;
}		}

return Val->getAggregateElement(CIdx);		return Val->getAggregateElement(CIdx);
}		}

Constant llvm::ConstantFoldInsertElementInstruction(Constant Val,		Constant llvm::ConstantFoldInsertElementInstruction(Constant Val,
Constant *Elt,		Constant *Elt,
Constant *Idx) {		Constant *Idx) {
▲ Show 20 Lines • Show All 1,747 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/gep-vector-indices.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -instcombine %s -S \| FileCheck %s			; RUN: opt -instcombine %s -S \| FileCheck %s

	define i32* @vector_splat_indices_v2i64_ext0(i32* %a) {			define i32* @vector_splat_indices_v2i64_ext0(i32* %a) {
	; CHECK-LABEL: @vector_splat_indices_v2i64_ext0(			; CHECK-LABEL: @vector_splat_indices_v2i64_ext0(
	; CHECK-NEXT: [[RES:%.]] = getelementptr i32, i32 [[A:%.*]], i64 4			; CHECK-NEXT: [[RES:%.]] = getelementptr i32, i32 [[A:%.*]], i64 4
	; CHECK-NEXT: ret i32* [[RES]]			; CHECK-NEXT: ret i32* [[RES]]
	;			;
	%gep = getelementptr i32, i32* %a, <2 x i64> <i64 4, i64 4>			%gep = getelementptr i32, i32* %a, <2 x i64> <i64 4, i64 4>
	%res = extractelement <2 x i32*> %gep, i32 0			%res = extractelement <2 x i32*> %gep, i32 0
	ret i32* %res			ret i32* %res
	}			}

	define i32* @vector_splat_indices_nxv2i64_ext0(i32* %a) {			define i32* @vector_splat_indices_nxv2i64_ext0(i32* %a) {
	; CHECK-LABEL: @vector_splat_indices_nxv2i64_ext0(			; CHECK-LABEL: @vector_splat_indices_nxv2i64_ext0(
	; CHECK-NEXT: [[RES:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 extractelement (<vscale x 2 x i64> shufflevector (<vscale x 2 x i64> insertelement (<vscale x 2 x i64> poison, i64 4, i32 0), <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer), i32 0)			; CHECK-NEXT: [[RES:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 4
	; CHECK-NEXT: ret i32* [[RES]]			; CHECK-NEXT: ret i32* [[RES]]
	;			;
	%tmp = insertelement <vscale x 2 x i64> poison, i64 4, i32 0			%tmp = insertelement <vscale x 2 x i64> poison, i64 4, i32 0
	%splatof4 = shufflevector <vscale x 2 x i64> %tmp, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer			%splatof4 = shufflevector <vscale x 2 x i64> %tmp, <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
	%gep = getelementptr inbounds i32, i32* %a, <vscale x 2 x i64> %splatof4			%gep = getelementptr inbounds i32, i32* %a, <vscale x 2 x i64> %splatof4
	%res = extractelement <vscale x 2 x i32*> %gep, i32 0			%res = extractelement <vscale x 2 x i32*> %gep, i32 0
	ret i32* %res			ret i32* %res
	}			}
	▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

llvm/test/Transforms/InstSimplify/ConstProp/extractelement-vscale.ll

; RUN: opt -instcombine -S < %s | FileCheck %s

; RUN: opt -S -instsimplify < %s | FileCheck %s

; CHECK-LABEL: definitely_in_bounds

; CHECK: ret i8 0

define i8 @definitely_in_bounds() {

ret i8 extractelement (<vscale x 16 x i8> zeroinitializer, i64 15)

}

; CHECK-LABEL: maybe_in_bounds

; CHECK: ret i8 extractelement (<vscale x 16 x i8> zeroinitializer, i64 16)

define i8 @maybe_in_bounds() {

ret i8 extractelement (<vscale x 16 x i8> zeroinitializer, i64 16)

}

; Examples of extracting a lane from a splat constant

MattUnsubmitted

Done

ret i8 extractelement (<vscale x 16 x i8> zeroinitializer, i64 16)

}

- ; Examples of extract a lane from a splat constant

+ ; Examples of extracting a lane from a splat constant

define i32 @extractconstant_shuffle_in_range(i32 %v) {

(Tiny nit:) Perhaps "extracting" would fit a little bit better here?

Matt: (Tiny nit:) Perhaps "extracting" would fit a little bit better here?

define i32 @extractconstant_shuffle_in_range(i32 %v) {

; CHECK-LABEL: @extractconstant_shuffle_in_range(

; CHECK-NEXT: ret i32 1024

;

%in = insertelement <vscale x 4 x i32> undef, i32 1024, i32 0

%splat = shufflevector <vscale x 4 x i32> %in, <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer

%r = extractelement <vscale x 4 x i32> %splat, i32 1

ret i32 %r

}

define i32 @extractconstant_shuffle_maybe_out_of_range(i32 %v) {

; CHECK-LABEL: @extractconstant_shuffle_maybe_out_of_range(

; CHECK-NEXT: ret i32 extractelement (<vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> undef, i32 1024, i32 0), <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer), i32 4)

;

%in = insertelement <vscale x 4 x i32> undef, i32 1024, i32 0

%splat = shufflevector <vscale x 4 x i32> %in, <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer

%r = extractelement <vscale x 4 x i32> %splat, i32 4

ret i32 %r

}

define i32 @extractconstant_shuffle_invalid_index(i32 %v) {

; CHECK-LABEL: @extractconstant_shuffle_invalid_index(

;

%in = insertelement <vscale x 4 x i32> undef, i32 1024, i32 0

%splat = shufflevector <vscale x 4 x i32> %in, <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer

%r = extractelement <vscale x 4 x i32> %splat, i32 -1

ret i32 %r

}

This is an archive of the discontinued LLVM Phabricator instance.

[InstSimplify] Add constant fold for extractelement + splat for scalable vectors
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 351132

llvm/lib/IR/ConstantFold.cpp

llvm/test/Transforms/InstCombine/gep-vector-indices.ll

llvm/test/Transforms/InstSimplify/ConstProp/extractelement-vscale.ll

This is an archive of the discontinued LLVM Phabricator instance.

[InstSimplify] Add constant fold for extractelement + splat for scalable vectorsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 351132

llvm/lib/IR/ConstantFold.cpp

llvm/test/Transforms/InstCombine/gep-vector-indices.ll

llvm/test/Transforms/InstSimplify/ConstProp/extractelement-vscale.ll

[InstSimplify] Add constant fold for extractelement + splat for scalable vectors
ClosedPublic