This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
10/10
InstCombineCasts.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
trunc-extractelement.ll

Differential D76983

[InstCombine] Transform extractelement-trunc -> bitcast-extractelement
ClosedPublic

Authored by dsprenkels on Mar 28 2020, 4:53 AM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
lebedev.ri
efriedma

Commits

rG464b9aeafe29: [InstCombine] Transform extelt-trunc -> bitcast-extelt

Summary

Canonicalize the case when a scalar extracted from a vector is
truncated. Transform such cases to bitcast-then-extractelement.
This will enable erasing the truncate operation.

This commit fixes PR45314.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	20 ms	LLVM.Transforms/InstCombine::Unknown Unit Message ("")

Event Timeline

dsprenkels created this revision.Mar 28 2020, 4:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2020, 4:53 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

It was a bit unclear to me how involved the tests should be for this patch. At this time, I kept them pretty minimal, but I could add more if that's desired.

Harbormaster failed remote builds in B50799: Diff 253329!Mar 28 2020, 5:55 AM

In D76983#1947820, @dsprenkels wrote:

It was a bit unclear to me how involved the tests should be for this patch. At this time, I kept them pretty minimal, but I could add more if that's desired.

I didn't look at the logic closely, but seems to be on the right track from the tests (feel free to include Alive2 links in the review if you tested any/all of these).
I see at least 2 variations where we need more code logic (and more tests):

; In IR, crazy types are allowed.
define i13 @shrinkExtractElt_i67_to_i13_2(<3 x i67> %x) {
  %e = extractelement <3 x i67> %x, i459 2
  %t = trunc i67 %e to i13
  ret i13 %t
}

; We generally don't want to canonicalize to a form that increases the instruction count.
declare void @use(i64)
define i16 @shrinkExtractElt_i64_to_i16_2_extra_use(<3 x i64> %x) {
  %e = extractelement <3 x i64> %x, i64 2
  call void @use(i64 %e)
  %t = trunc i64 %e to i16
  ret i16 %t
}

I didn't run those through 'opt', so check for typos.

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
855	m_ExtractElement() for matching/capturing the operands ?
llvm/test/Transforms/InstCombine/pr45314_be.ll
3 ↗	(On Diff #253329)	Only need the first "E" here (and that makes it clear that the transform depends on endian and not something else in that string).

Should we be checking anything about legality of the types here?

Do not canonicalize if the extractvector operand has other users.
Do not canonicalize if it would result in an invalid bitcast instruction.

I didn't look at the logic closely, but seems to be on the right track from the tests (feel free to include Alive2 links in the review if you tested any/all of these).
I see at least 2 variations where we need more code logic (and more tests):

I added these cases to the patch.

For the case of the crazy types I added a check that requires the respective sizes of the vector and the truncated value to be compatible. This should prevent against an invalid bitcast being created. However, I still kinda allow really crazy types. Would it be better to just disable this canonicalization for types that don't look nice ? (Are "nice" types somehow even defined?)

Some examples that I checked with Alive2:

Little endian
  - <3 x i64> to i32: http://volta.cs.utah.edu:8080/z/tt-qpe
  - <3 x i64> to i32: http://volta.cs.utah.edu:8080/z/KAkFBh (idx=1)
  - <3 x i64> to i16: http://volta.cs.utah.edu:8080/z/EqzJSY (idx=2)

Big endian:
  - <3 x i64> to i32: http://volta.cs.utah.edu:8080/z/FRbHAA
  - <3 x i64> to i32: http://volta.cs.utah.edu:8080/z/oFvAdt (idx=1)
  - <3 x i64> to i16: http://volta.cs.utah.edu:8080/z/yB7Hng (idx=2)

See inline for a few code nits, and I'd make some test changes:

Combine the 2 separate test files into 1 to reduce duplication and make the endian diffs easier to see. Take a look at llvm/test/Transforms/InstCombine/extractelement.ll to see an example.
Name the file based on the transform rather than a bug name.
Add a test that corresponds to the larger pattern from the bug report, so we know that the sequence of transforms within instcombine works on the motivating bug:

define <4 x i64> @PR45314(<4 x i64> %x) {
  %e = extractelement <4 x i64> %x, i32 0
  %t = trunc i64 %e to i32
  %i = insertelement <8 x i32> undef, i32 %t, i32 0
  %s = shufflevector <8 x i32> %i, <8 x i32> undef, <8 x i32> zeroinitializer
  %b = bitcast <8 x i32> %s to <4 x i64>
  ret <4 x i64> %b
}

Generate the baseline CHECK lines without this code patch in place and push the tests to master as a preliminary NFC patch. Then apply this code patch, so we see the code diffs resulting from this patch in this review. That way, the tests will safely remain even if this code patch has to be reverted for some reason.

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
847	I don't think we use the triple-slash documentation comment within a function. Either change to the standard "//" or make a helper function with the doxygen comment (similar to "foldVecTruncToExtElt" just above here).
855	Don't initialize this to nullptr. If the match fails, the variable should not be used, so initializing hides a potential compile-time warning for an unintended use of that variable. The same should be true of the existing variables, so if you want to fix them 1st with an NFC patch, that would be ok.
864	The VecNumElts factor doesn't change the modulo constraint?
865	Would be slightly easier to read if we made a local name for the common factor like: unsigned TruncRatio = VecOpScalarSize / DestScalarSize;

In D76983#1948011, @dsprenkels wrote:

However, I still kinda allow really crazy types. Would it be better to just disable this canonicalization for types that don't look nice ? (Are "nice" types somehow even defined?)

That's similar to @lebedev.ri 's question about legality. We have helpers that look at the data-layout to decide if a type is target-friendly/legal:

bool InstCombiner::shouldChangeType(unsigned FromWidth, unsigned ToWidth)
bool InstCombiner::shouldChangeType(Type *From, Type *To);

But that doesn't work with vector types (...because the data-layout doesn't specify vector types AFAIK). In this transform, we are not creating any new type except for a bitcast of a vector type. So I don't think we want to limit the transform. Targets/codegen should be able to deal with bitcasts of vector types.

@spatel Thanks for the review. I will soon look into it!

Generate the baseline CHECK lines without this code patch in place and push the tests to master as a preliminary NFC patch. Then apply this code patch, so we see the code diffs resulting from this patch in this review. That way, the tests will safely remain even if this code patch has to be reverted for some reason.

I don't think I have the permissions do push directly to master. Is this easily fixed? Or should I create a separate differential for this?

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
864	This is intentional. In this check, I need the bit-width of the whole vector. Consider this example: http://volta.cs.utah.edu:8080/z/-GpGHX target datalayout = "e" define i30 @src(<3 x i40> %x) { %e = extractelement <3 x i40> %x, i32 0 %t = trunc i40 %e to i30 ret i30 %t } define i30 @tgt(<3 x i40> %x) { %1 = bitcast <3 x i40> %x to <4 x i30> %t = extractelement <4 x i30> %1, i30 0 ret i30 %t } This describes a valid case to be canonicalized by this patch, however `VecOpScalarSize % DestScalarSize == 40 % 30 != 0`. That is why I check if `(VecNumElts * VecOpScalarSize) % DestScalarSize == (3 * 40) % 30 == 0`. Should I add a comment here to clarify this?

In D76983#1948546, @dsprenkels wrote:

@spatel Thanks for the review. I will soon look into it!

Generate the baseline CHECK lines without this code patch in place and push the tests to master as a preliminary NFC patch. Then apply this code patch, so we see the code diffs resulting from this patch in this review. That way, the tests will safely remain even if this code patch has to be reverted for some reason.

I don't think I have the permissions do push directly to master. Is this easily fixed? Or should I create a separate differential for this?

My git lingo might be off here. Are you saying you don't have commit permissions for LLVM yet?
https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access

If not, then yes please create a separate Phab review and someone will push that for you.

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
864	Hmm...how does that example translate for big-endian? A code comment is good; another regression test is even better.

The new preliminary tests have been submitted in https://reviews.llvm.org/D77024. I removed the test for now and will submit another update when the other diff has been committed.

I also updated the diff in line with the inline comments.

My git lingo might be off here. Are you saying you don't have commit permissions for LLVM yet?

Yup. Should I ask for it? (I don't know what the etiquette is; i.e. how many patches before one should request commit access.)

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
855	I updated every variable that this patch touches. I can probably fix the others later in an NFC patch.
864	You are right, this case would lead to an invalid transform. I fixed this, and added a test in https://reviews.llvm.org/D77024 that checks that these cases should not be changed.

Harbormaster failed remote builds in B50881: Diff 253452!Mar 29 2020, 1:56 PM

In D76983#1948849, @dsprenkels wrote:

My git lingo might be off here. Are you saying you don't have commit permissions for LLVM yet?

Yup. Should I ask for it? (I don't know what the etiquette is; i.e. how many patches before one should request commit access.)

I don't see it stated explicitly anywhere, but I think 2-3 functional patches usually qualifies as a good "track record". If you have that already, then please request. If not, no problem - I can commit on your behalf.

spatel mentioned this in rG24562c6588bf: [InstCombine] Add tests for trunc (extelt x); (NFC) Baseline tests for D76983….Mar 29 2020, 3:00 PM

Update the new trunc-extractelement.ll test file.

Harbormaster failed remote builds in B50913: Diff 253505!Mar 30 2020, 1:02 AM

I think we're pretty close now.
Added another test, so please rebase/update:
rGbc60cdcc3f8

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
869–872	The index math can overflow: define i8 @src(<1073741824 x i32> %x) { %e = extractelement <1073741824 x i32> %x, i32 1073741823 %t = trunc i32 %e to i8 ret i8 %t } To be safe(r), use uint64_t for these variables. Normally, we want to have a regression test for a known problem like that, but I'm going to suggest not adding that because it could cost a lot of execution time for a test case that is probably not going to occur in the real-world before LLVM is long gone.

spatel mentioned this in rGbc60cdcc3f86: [InstCombine] add test for trunc-extelt; NFC.Mar 30 2020, 7:00 AM

Updated the test.
Changed uint32_ts to uint64_t and added an assert to catch overflows.

dsprenkels marked 2 inline comments as done.Mar 30 2020, 9:17 AM

dsprenkels marked an inline comment as done.

LGTM

This revision is now accepted and ready to land.Mar 30 2020, 9:59 AM

Harbormaster failed remote builds in B50970: Diff 253613!Mar 30 2020, 10:17 AM

Let me look at the failed test in a couple of hours.

Btw. I have commit permissions now, so I will be able to commit this myself now. :)

In D76983#1950343, @dsprenkels wrote:

Let me look at the failed test in a couple of hours.

Btw. I have commit permissions now, so I will be able to commit this myself now. :)

Great! A good first commit / NFC patch will update this old test file:
https://github.com/llvm/llvm-project/blob/master/llvm/test/Transforms/InstCombine/ExtractCast.ll
by using utils/update_test_checks.py to auto-generate the CHECK lines. I suspect that we're just picking up an existing test with this transform (not a bug).

Updated ExtractCast.ll test.

Should I explicitly add the data layout in ExtractCast.ll, because the added assertions are only valid on little endian platforms?

In D76983#1951210, @dsprenkels wrote:

Updated ExtractCast.ll test.

Should I explicitly add the data layout in ExtractCast.ll, because the added assertions are only valid on little endian platforms?

I don’t see a need for that. That test is practically identical to the tests we added explicitly for this transform.

Harbormaster completed remote builds in B51048: Diff 253728.Mar 30 2020, 4:57 PM

Closed by commit rG464b9aeafe29: [InstCombine] Transform extelt-trunc -> bitcast-extelt (authored by dsprenkels). · Explain WhyMar 31 2020, 3:19 AM

This revision was automatically updated to reflect the committed changes.

This patch triggers a regression on our side:

For the following code:

define dso_local <4 x i16> @truncate_v_v(<4 x i32> %lhs) #0 {
entry:
  %vecext = extractelement <4 x i32> %lhs, i32 0
  %conv = trunc i32 %vecext to i16
  %vecinit = insertelement <4 x i16> undef, i16 %conv, i32 0
  %vecext1 = extractelement <4 x i32> %lhs, i32 1
  %conv2 = trunc i32 %vecext1 to i16
  %vecinit3 = insertelement <4 x i16> %vecinit, i16 %conv2, i32 1
  %vecext4 = extractelement <4 x i32> %lhs, i32 2
  %conv5 = trunc i32 %vecext4 to i16
  %vecinit6 = insertelement <4 x i16> %vecinit3, i16 %conv5, i32 2
  %vecext7 = extractelement <4 x i32> %lhs, i32 3
  %conv8 = trunc i32 %vecext7 to i16
  %vecinit9 = insertelement <4 x i16> %vecinit6, i16 %conv8, i32 3
  ret <4 x i16> %vecinit9
}

The tests expects to see:

define dso_local <4 x i16> @truncate_v_v(<4 x i32> %lhs) local_unnamed_addr #0 {
entry:
  %0 = trunc <4 x i32> %lhs to <4 x i16>
  ret <4 x i16> %0
}

which, in machine instructions, is mapped onto a vector trunc instruction.

But now, we see:

define dso_local <4 x i16> @truncate_v_v(<4 x i32> %lhs) local_unnamed_addr #0 {
entry:
  %0 = bitcast <4 x i32> %lhs to <8 x i16>
  %vecinit9 = shufflevector <8 x i16> %0, <8 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
  ret <4 x i16> %vecinit9
}

which is expanded into a large sequence of code going through the stack.

In D76983#1954722, @jeroen.dobbelaere wrote:
This patch triggers a regression on our side:

<...>

The tests expects to see:
define dso_local <4 x i16> @truncate_v_v(<4 x i32> %lhs) local_unnamed_addr #0 {
entry:
  %0 = trunc <4 x i32> %lhs to <4 x i16>
  ret <4 x i16> %0
}
which, in machine instructions, is mapped onto a vector trunc instruction.

But now, we see:
define dso_local <4 x i16> @truncate_v_v(<4 x i32> %lhs) local_unnamed_addr #0 {
entry:
  %0 = bitcast <4 x i32> %lhs to <8 x i16>
  %vecinit9 = shufflevector <8 x i16> %0, <8 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
  ret <4 x i16> %vecinit9
}
which is expanded into a large sequence of code going through the stack.

This looks like a simple missed transform to me, not a miscompile

In D76983#1954751, @lebedev.ri wrote:
In D76983#1954722, @jeroen.dobbelaere wrote:
This patch triggers a regression on our side:

<...>

The tests expects to see:
define dso_local <4 x i16> @truncate_v_v(<4 x i32> %lhs) local_unnamed_addr #0 {
entry:
  %0 = trunc <4 x i32> %lhs to <4 x i16>
  ret <4 x i16> %0
}
which, in machine instructions, is mapped onto a vector trunc instruction.

But now, we see:
define dso_local <4 x i16> @truncate_v_v(<4 x i32> %lhs) local_unnamed_addr #0 {
entry:
  %0 = bitcast <4 x i32> %lhs to <8 x i16>
  %vecinit9 = shufflevector <8 x i16> %0, <8 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
  ret <4 x i16> %vecinit9
}
which is expanded into a large sequence of code going through the stack.
This looks like a simple missed transform to me, not a miscompile

I agree. We hit a phase ordering difference - SLP can reduce the chain of insert/extract to a vector trunc, but it doesn't handle the shuffle-of-bitcast. The open question is where to implement that transform. We're on the edge of instcombine vs. vector-combine if we want to do this in IR. Ie, is there consensus that forming a size-changing vector cast from a shuffle is canonical?
Alternatively, we could defer to the backend, but that could still be viewed as a regression in IR since we have more instructions now.

In D76983#1954774, @spatel wrote:
In D76983#1954751, @lebedev.ri wrote:
In D76983#1954722, @jeroen.dobbelaere wrote:
This patch triggers a regression on our side:

<...>

The tests expects to see:
define dso_local <4 x i16> @truncate_v_v(<4 x i32> %lhs) local_unnamed_addr #0 {
entry:
  %0 = trunc <4 x i32> %lhs to <4 x i16>
  ret <4 x i16> %0
}
which, in machine instructions, is mapped onto a vector trunc instruction.

But now, we see:
define dso_local <4 x i16> @truncate_v_v(<4 x i32> %lhs) local_unnamed_addr #0 {
entry:
  %0 = bitcast <4 x i32> %lhs to <8 x i16>
  %vecinit9 = shufflevector <8 x i16> %0, <8 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
  ret <4 x i16> %vecinit9
}
which is expanded into a large sequence of code going through the stack.
This looks like a simple missed transform to me, not a miscompile
I agree. We hit a phase ordering difference - SLP can reduce the chain of insert/extract to a vector trunc,
but it doesn't handle the shuffle-of-bitcast. The open question is where to implement that transform.
We're on the edge of instcombine vs. vector-combine if we want to do this in IR.

Ie, is there consensus that forming a size-changing vector cast from a shuffle is canonical?

I would have guessed it is, yes.

Alternatively, we could defer to the backend, but that could still be viewed as a regression in IR since we have more instructions now.

In D76983#1955540, @lebedev.ri wrote:

Ie, is there consensus that forming a size-changing vector cast from a shuffle is canonical?

I would have guessed it is, yes.

Agree - the trunc is better for analysis, and a quick check of various backends says we do worse at codegen of the shuffle than the trunc, so that's more likely to be the expected form.
And not sure if it counts for anything, but the trunc is the more human-readable form (vs. translating shuffle indexes that depend on endian).
I'll draft a patch.

In D76983#1955660, @spatel wrote:

I'll draft a patch.

thanks !

spatel mentioned this in rGa19b27b90e5e: [PhaseOrdering] add test for vector trunc; NFC See discussion in D76983..Apr 2 2020, 5:23 AM

spatel mentioned this in D77299: [InstCombine] convert bitcast-shuffle to vector trunc.Apr 2 2020, 5:38 AM

In D76983#1956732, @jeroen.dobbelaere wrote:

In D76983#1955660, @spatel wrote:

I'll draft a patch.

thanks !

D77299

spatel mentioned this in rG538a8f02271b: [InstCombine] convert bitcast-shuffle to vector trunc.Apr 5 2020, 6:56 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCasts.cpp

35 lines

test/

Transforms/

InstCombine/

trunc-extractelement.ll

96 lines

Diff 253505

llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp

Show First 20 Lines • Show All 684 Lines • ▼ Show 20 Lines
}		}

Instruction *InstCombiner::visitTrunc(TruncInst &CI) {		Instruction *InstCombiner::visitTrunc(TruncInst &CI) {
if (Instruction *Result = commonCastTransforms(CI))		if (Instruction *Result = commonCastTransforms(CI))
return Result;		return Result;

Value *Src = CI.getOperand(0);		Value *Src = CI.getOperand(0);
Type DestTy = CI.getType(), SrcTy = Src->getType();		Type DestTy = CI.getType(), SrcTy = Src->getType();
		ConstantInt *Cst;

// Attempt to truncate the entire input expression tree to the destination		// Attempt to truncate the entire input expression tree to the destination
// type. Only do this if the dest type is a simple type, don't convert the		// type. Only do this if the dest type is a simple type, don't convert the
// expression tree to something weird like i93 unless the source is also		// expression tree to something weird like i93 unless the source is also
// strange.		// strange.
if ((DestTy->isVectorTy() \|\| shouldChangeType(SrcTy, DestTy)) &&		if ((DestTy->isVectorTy() \|\| shouldChangeType(SrcTy, DestTy)) &&
canEvaluateTruncated(Src, DestTy, *this, &CI)) {		canEvaluateTruncated(Src, DestTy, *this, &CI)) {

▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	if (match(Src, m_OneUse(m_c_Or(m_LShr(m_Value(X), m_APInt(C)),
return new ICmpInst(ICmpInst::ICMP_NE, And, Zero);		return new ICmpInst(ICmpInst::ICMP_NE, And, Zero);
}		}
}		}

// FIXME: Maybe combine the next two transforms to handle the no cast case		// FIXME: Maybe combine the next two transforms to handle the no cast case
// more efficiently. Support vector types. Cleanup code by using m_OneUse.		// more efficiently. Support vector types. Cleanup code by using m_OneUse.

// Transform trunc(lshr (zext A), Cst) to eliminate one type conversion.		// Transform trunc(lshr (zext A), Cst) to eliminate one type conversion.
Value A = nullptr; ConstantInt Cst = nullptr;		Value *A = nullptr;
if (Src->hasOneUse() &&		if (Src->hasOneUse() &&
match(Src, m_LShr(m_ZExt(m_Value(A)), m_ConstantInt(Cst)))) {		match(Src, m_LShr(m_ZExt(m_Value(A)), m_ConstantInt(Cst)))) {
// We have three types to worry about here, the type of A, the source of		// We have three types to worry about here, the type of A, the source of
// the truncate (MidSize), and the destination of the truncate. We know that		// the truncate (MidSize), and the destination of the truncate. We know that
// ASize < MidSize and MidSize > ResultSize, but don't know the relation		// ASize < MidSize and MidSize > ResultSize, but don't know the relation
// between ASize and ResultSize.		// between ASize and ResultSize.
unsigned ASize = A->getType()->getPrimitiveSizeInBits();		unsigned ASize = A->getType()->getPrimitiveSizeInBits();

▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	if (match(Src, m_Shl(m_Value(A), m_ConstantInt(Cst))) &&
ConstantInt::get(DestTy, Cst->getValue().trunc(DestSize)));		ConstantInt::get(DestTy, Cst->getValue().trunc(DestSize)));
}		}
}		}
}		}

if (Instruction I = foldVecTruncToExtElt(CI, this))		if (Instruction I = foldVecTruncToExtElt(CI, this))
return I;		return I;

		// Whenever an element is extracted from a vector, and then truncated,
		spatelUnsubmitted Done Reply Inline Actions I don't think we use the triple-slash documentation comment within a function. Either change to the standard "//" or make a helper function with the doxygen comment (similar to "foldVecTruncToExtElt" just above here). spatel: I don't think we use the triple-slash documentation comment within a function. Either change to…
		// canonicalize by converting it to a bitcast followed by an
		// extractelement.
		//
		// Example (little endian):
		// trunc (extractelement <4 x i64> %X, 0) to i32
		// --->
		// extractelement <8 x i32> (bitcast <4 x i64> %X to <8 x i32>), i32 0
		Value *VecOp;
		spatelUnsubmitted Done Reply Inline Actions m_ExtractElement() for matching/capturing the operands ? spatel: m_ExtractElement() for matching/capturing the operands ?
		spatelUnsubmitted Done Reply Inline Actions Don't initialize this to nullptr. If the match fails, the variable should not be used, so initializing hides a potential compile-time warning for an unintended use of that variable. The same should be true of the existing variables, so if you want to fix them 1st with an NFC patch, that would be ok. spatel: Don't initialize this to nullptr. If the match fails, the variable should not be used, so…
		dsprenkelsAuthorUnsubmitted Done Reply Inline Actions I updated every variable that this patch touches. I can probably fix the others later in an NFC patch. dsprenkels: I updated every variable that this patch touches. I can probably fix the others later in an NFC…
		if (match(Src,
		m_OneUse(m_ExtractElement(m_Value(VecOp), m_ConstantInt(Cst))))) {
		Type *VecOpTy = VecOp->getType();
		unsigned DestScalarSize = DestTy->getScalarSizeInBits();
		unsigned VecOpScalarSize = VecOpTy->getScalarSizeInBits();
		unsigned VecNumElts = VecOpTy->getVectorNumElements();

		// A badly fit destination size would result in an invalid cast.
		if (VecOpScalarSize % DestScalarSize == 0) {
		spatelUnsubmitted Done Reply Inline Actions The VecNumElts factor doesn't change the modulo constraint? spatel: The VecNumElts factor doesn't change the modulo constraint?
		dsprenkelsAuthorUnsubmitted Done Reply Inline Actions This is intentional. In this check, I need the bit-width of the whole vector. Consider this example: http://volta.cs.utah.edu:8080/z/-GpGHX target datalayout = "e" define i30 @src(<3 x i40> %x) { %e = extractelement <3 x i40> %x, i32 0 %t = trunc i40 %e to i30 ret i30 %t } define i30 @tgt(<3 x i40> %x) { %1 = bitcast <3 x i40> %x to <4 x i30> %t = extractelement <4 x i30> %1, i30 0 ret i30 %t } This describes a valid case to be canonicalized by this patch, however `VecOpScalarSize % DestScalarSize == 40 % 30 != 0`. That is why I check if `(VecNumElts * VecOpScalarSize) % DestScalarSize == (3 * 40) % 30 == 0`. Should I add a comment here to clarify this? dsprenkels: This is intentional. In this check, I need the bit-width of the whole vector. Consider this…
		spatelUnsubmitted Done Reply Inline Actions Hmm...how does that example translate for big-endian? A code comment is good; another regression test is even better. spatel: Hmm...how does that example translate for big-endian? A code comment is good; another…
		dsprenkelsAuthorUnsubmitted Done Reply Inline Actions You are right, this case would lead to an invalid transform. I fixed this, and added a test in https://reviews.llvm.org/D77024 that checks that these cases should not be changed. dsprenkels: You are right, this case would lead to an invalid transform. I fixed this, and added a test in…
		unsigned TruncRatio = VecOpScalarSize / DestScalarSize;
		spatelUnsubmitted Done Reply Inline Actions Would be slightly easier to read if we made a local name for the common factor like: unsigned TruncRatio = VecOpScalarSize / DestScalarSize; spatel: Would be slightly easier to read if we made a local name for the common factor like: unsigned…
		unsigned BitCastNumElts = VecNumElts * TruncRatio;
		unsigned VecOpIdx = Cst->getZExtValue();
		unsigned NewIdx =
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - unsigned NewIdx = - DL.isBigEndian() - ? (VecOpIdx + 1) * TruncRatio - 1 - : VecOpIdx * TruncRatio; + unsigned NewIdx = DL.isBigEndian() ? (VecOpIdx + 1) * TruncRatio - 1 + : VecOpIdx * TruncRatio; Lint: Pre-merge checks: clang-format: please reformat the code ``` - unsigned NewIdx = - DL.isBigEndian()…
		DL.isBigEndian()
		? (VecOpIdx + 1) * TruncRatio - 1
		: VecOpIdx * TruncRatio;

		spatelUnsubmitted Done Reply Inline Actions The index math can overflow: define i8 @src(<1073741824 x i32> %x) { %e = extractelement <1073741824 x i32> %x, i32 1073741823 %t = trunc i32 %e to i8 ret i8 %t } To be safe(r), use uint64_t for these variables. Normally, we want to have a regression test for a known problem like that, but I'm going to suggest not adding that because it could cost a lot of execution time for a test case that is probably not going to occur in the real-world before LLVM is long gone. spatel: The index math can overflow: define i8 @src(<1073741824 x i32> %x) { %e = extractelement…
		Type *BitCastTo = VectorType::get(DestTy, BitCastNumElts);
		Value *BitCast = Builder.CreateBitCast(VecOp, BitCastTo);
		return ExtractElementInst::Create(BitCast, Builder.getInt32(NewIdx));
		}
		}

return nullptr;		return nullptr;
}		}

Instruction InstCombiner::transformZExtICmp(ICmpInst Cmp, ZExtInst &Zext,		Instruction InstCombiner::transformZExtICmp(ICmpInst Cmp, ZExtInst &Zext,
bool DoTransform) {		bool DoTransform) {
// If we are just checking for a icmp eq of a single bit and zext'ing it		// If we are just checking for a icmp eq of a single bit and zext'ing it
// to an integer, then shift the bit to the appropriate place and then		// to an integer, then shift the bit to the appropriate place and then
// cast to integer to avoid the comparison.		// cast to integer to avoid the comparison.
▲ Show 20 Lines • Show All 1,730 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/trunc-extractelement.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -instcombine -S -data-layout="e" \| FileCheck %s --check-prefixes=ANY,LE		; RUN: opt < %s -instcombine -S -data-layout="e" \| FileCheck %s --check-prefixes=ANY,LE
; RUN: opt < %s -instcombine -S -data-layout="E" \| FileCheck %s --check-prefixes=ANY,BE		; RUN: opt < %s -instcombine -S -data-layout="E" \| FileCheck %s --check-prefixes=ANY,BE

define i32 @shrinkExtractElt_i64_to_i32_0(<3 x i64> %x) {		define i32 @shrinkExtractElt_i64_to_i32_0(<3 x i64> %x) {
; ANY-LABEL: @shrinkExtractElt_i64_to_i32_0(		; LE-LABEL: @shrinkExtractElt_i64_to_i32_0(
; ANY-NEXT: [[E:%.]] = extractelement <3 x i64> [[X:%.]], i32 0		; LE-NEXT: [[TMP1:%.]] = bitcast <3 x i64> [[X:%.]] to <6 x i32>
; ANY-NEXT: [[T:%.*]] = trunc i64 [[E]] to i32		; LE-NEXT: [[T:%.*]] = extractelement <6 x i32> [[TMP1]], i32 0
; ANY-NEXT: ret i32 [[T]]		; LE-NEXT: ret i32 [[T]]
		;
		; BE-LABEL: @shrinkExtractElt_i64_to_i32_0(
		; BE-NEXT: [[TMP1:%.]] = bitcast <3 x i64> [[X:%.]] to <6 x i32>
		; BE-NEXT: [[T:%.*]] = extractelement <6 x i32> [[TMP1]], i32 1
		; BE-NEXT: ret i32 [[T]]
;		;
%e = extractelement <3 x i64> %x, i32 0		%e = extractelement <3 x i64> %x, i32 0
%t = trunc i64 %e to i32		%t = trunc i64 %e to i32
ret i32 %t		ret i32 %t
}		}

define i32 @shrinkExtractElt_i64_to_i32_1(<3 x i64> %x) {		define i32 @shrinkExtractElt_i64_to_i32_1(<3 x i64> %x) {
; ANY-LABEL: @shrinkExtractElt_i64_to_i32_1(		; LE-LABEL: @shrinkExtractElt_i64_to_i32_1(
; ANY-NEXT: [[E:%.]] = extractelement <3 x i64> [[X:%.]], i32 1		; LE-NEXT: [[TMP1:%.]] = bitcast <3 x i64> [[X:%.]] to <6 x i32>
; ANY-NEXT: [[T:%.*]] = trunc i64 [[E]] to i32		; LE-NEXT: [[T:%.*]] = extractelement <6 x i32> [[TMP1]], i32 2
; ANY-NEXT: ret i32 [[T]]		; LE-NEXT: ret i32 [[T]]
		;
		; BE-LABEL: @shrinkExtractElt_i64_to_i32_1(
		; BE-NEXT: [[TMP1:%.]] = bitcast <3 x i64> [[X:%.]] to <6 x i32>
		; BE-NEXT: [[T:%.*]] = extractelement <6 x i32> [[TMP1]], i32 3
		; BE-NEXT: ret i32 [[T]]
;		;
%e = extractelement <3 x i64> %x, i32 1		%e = extractelement <3 x i64> %x, i32 1
%t = trunc i64 %e to i32		%t = trunc i64 %e to i32
ret i32 %t		ret i32 %t
}		}

define i32 @shrinkExtractElt_i64_to_i32_2(<3 x i64> %x) {		define i32 @shrinkExtractElt_i64_to_i32_2(<3 x i64> %x) {
; ANY-LABEL: @shrinkExtractElt_i64_to_i32_2(		; LE-LABEL: @shrinkExtractElt_i64_to_i32_2(
; ANY-NEXT: [[E:%.]] = extractelement <3 x i64> [[X:%.]], i32 2		; LE-NEXT: [[TMP1:%.]] = bitcast <3 x i64> [[X:%.]] to <6 x i32>
; ANY-NEXT: [[T:%.*]] = trunc i64 [[E]] to i32		; LE-NEXT: [[T:%.*]] = extractelement <6 x i32> [[TMP1]], i32 4
; ANY-NEXT: ret i32 [[T]]		; LE-NEXT: ret i32 [[T]]
		;
		; BE-LABEL: @shrinkExtractElt_i64_to_i32_2(
		; BE-NEXT: [[TMP1:%.]] = bitcast <3 x i64> [[X:%.]] to <6 x i32>
		; BE-NEXT: [[T:%.*]] = extractelement <6 x i32> [[TMP1]], i32 5
		; BE-NEXT: ret i32 [[T]]
;		;
%e = extractelement <3 x i64> %x, i32 2		%e = extractelement <3 x i64> %x, i32 2
%t = trunc i64 %e to i32		%t = trunc i64 %e to i32
ret i32 %t		ret i32 %t
}		}

define i16 @shrinkExtractElt_i64_to_i16_0(<3 x i64> %x) {		define i16 @shrinkExtractElt_i64_to_i16_0(<3 x i64> %x) {
; ANY-LABEL: @shrinkExtractElt_i64_to_i16_0(		; LE-LABEL: @shrinkExtractElt_i64_to_i16_0(
; ANY-NEXT: [[E:%.]] = extractelement <3 x i64> [[X:%.]], i16 0		; LE-NEXT: [[TMP1:%.]] = bitcast <3 x i64> [[X:%.]] to <12 x i16>
; ANY-NEXT: [[T:%.*]] = trunc i64 [[E]] to i16		; LE-NEXT: [[T:%.*]] = extractelement <12 x i16> [[TMP1]], i32 0
; ANY-NEXT: ret i16 [[T]]		; LE-NEXT: ret i16 [[T]]
		;
		; BE-LABEL: @shrinkExtractElt_i64_to_i16_0(
		; BE-NEXT: [[TMP1:%.]] = bitcast <3 x i64> [[X:%.]] to <12 x i16>
		; BE-NEXT: [[T:%.*]] = extractelement <12 x i16> [[TMP1]], i32 3
		; BE-NEXT: ret i16 [[T]]
;		;
%e = extractelement <3 x i64> %x, i16 0		%e = extractelement <3 x i64> %x, i16 0
%t = trunc i64 %e to i16		%t = trunc i64 %e to i16
ret i16 %t		ret i16 %t
}		}

define i16 @shrinkExtractElt_i64_to_i16_1(<3 x i64> %x) {		define i16 @shrinkExtractElt_i64_to_i16_1(<3 x i64> %x) {
; ANY-LABEL: @shrinkExtractElt_i64_to_i16_1(		; LE-LABEL: @shrinkExtractElt_i64_to_i16_1(
; ANY-NEXT: [[E:%.]] = extractelement <3 x i64> [[X:%.]], i16 1		; LE-NEXT: [[TMP1:%.]] = bitcast <3 x i64> [[X:%.]] to <12 x i16>
; ANY-NEXT: [[T:%.*]] = trunc i64 [[E]] to i16		; LE-NEXT: [[T:%.*]] = extractelement <12 x i16> [[TMP1]], i32 4
; ANY-NEXT: ret i16 [[T]]		; LE-NEXT: ret i16 [[T]]
		;
		; BE-LABEL: @shrinkExtractElt_i64_to_i16_1(
		; BE-NEXT: [[TMP1:%.]] = bitcast <3 x i64> [[X:%.]] to <12 x i16>
		; BE-NEXT: [[T:%.*]] = extractelement <12 x i16> [[TMP1]], i32 7
		; BE-NEXT: ret i16 [[T]]
;		;
%e = extractelement <3 x i64> %x, i16 1		%e = extractelement <3 x i64> %x, i16 1
%t = trunc i64 %e to i16		%t = trunc i64 %e to i16
ret i16 %t		ret i16 %t
}		}

define i16 @shrinkExtractElt_i64_to_i16_2(<3 x i64> %x) {		define i16 @shrinkExtractElt_i64_to_i16_2(<3 x i64> %x) {
; ANY-LABEL: @shrinkExtractElt_i64_to_i16_2(		; LE-LABEL: @shrinkExtractElt_i64_to_i16_2(
; ANY-NEXT: [[E:%.]] = extractelement <3 x i64> [[X:%.]], i16 2		; LE-NEXT: [[TMP1:%.]] = bitcast <3 x i64> [[X:%.]] to <12 x i16>
; ANY-NEXT: [[T:%.*]] = trunc i64 [[E]] to i16		; LE-NEXT: [[T:%.*]] = extractelement <12 x i16> [[TMP1]], i32 8
; ANY-NEXT: ret i16 [[T]]		; LE-NEXT: ret i16 [[T]]
		;
		; BE-LABEL: @shrinkExtractElt_i64_to_i16_2(
		; BE-NEXT: [[TMP1:%.]] = bitcast <3 x i64> [[X:%.]] to <12 x i16>
		; BE-NEXT: [[T:%.*]] = extractelement <12 x i16> [[TMP1]], i32 11
		; BE-NEXT: ret i16 [[T]]
;		;
%e = extractelement <3 x i64> %x, i16 2		%e = extractelement <3 x i64> %x, i16 2
%t = trunc i64 %e to i16		%t = trunc i64 %e to i16
ret i16 %t		ret i16 %t
}		}

; Do not optimize if it would result in an invalid bitcast instruction.		; Do not optimize if it would result in an invalid bitcast instruction.
define i13 @shrinkExtractElt_i67_to_i13_2(<3 x i67> %x) {		define i13 @shrinkExtractElt_i67_to_i13_2(<3 x i67> %x) {
Show All 32 Lines	;
%e = extractelement <3 x i64> %x, i64 2		%e = extractelement <3 x i64> %x, i64 2
call void @use(i64 %e)		call void @use(i64 %e)
%t = trunc i64 %e to i16		%t = trunc i64 %e to i16
ret i16 %t		ret i16 %t
}		}

; Check to ensure PR45314 remains fixed.		; Check to ensure PR45314 remains fixed.
define <4 x i64> @PR45314(<4 x i64> %x) {		define <4 x i64> @PR45314(<4 x i64> %x) {
; ANY-LABEL: @PR45314(		; LE-LABEL: @PR45314(
; ANY-NEXT: [[E:%.]] = extractelement <4 x i64> [[X:%.]], i32 0		; LE-NEXT: [[TMP1:%.]] = bitcast <4 x i64> [[X:%.]] to <8 x i32>
; ANY-NEXT: [[T:%.*]] = trunc i64 [[E]] to i32		; LE-NEXT: [[S:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> zeroinitializer
; ANY-NEXT: [[I:%.*]] = insertelement <8 x i32> undef, i32 [[T]], i32 0		; LE-NEXT: [[B:%.*]] = bitcast <8 x i32> [[S]] to <4 x i64>
; ANY-NEXT: [[S:%.*]] = shufflevector <8 x i32> [[I]], <8 x i32> undef, <8 x i32> zeroinitializer		; LE-NEXT: ret <4 x i64> [[B]]
; ANY-NEXT: [[B:%.*]] = bitcast <8 x i32> [[S]] to <4 x i64>		;
; ANY-NEXT: ret <4 x i64> [[B]]		; BE-LABEL: @PR45314(
		; BE-NEXT: [[TMP1:%.]] = bitcast <4 x i64> [[X:%.]] to <8 x i32>
		; BE-NEXT: [[S:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> undef, <8 x i32> <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
		; BE-NEXT: [[B:%.*]] = bitcast <8 x i32> [[S]] to <4 x i64>
		; BE-NEXT: ret <4 x i64> [[B]]
;		;
%e = extractelement <4 x i64> %x, i32 0		%e = extractelement <4 x i64> %x, i32 0
%t = trunc i64 %e to i32		%t = trunc i64 %e to i32
%i = insertelement <8 x i32> undef, i32 %t, i32 0		%i = insertelement <8 x i32> undef, i32 %t, i32 0
%s = shufflevector <8 x i32> %i, <8 x i32> undef, <8 x i32> zeroinitializer		%s = shufflevector <8 x i32> %i, <8 x i32> undef, <8 x i32> zeroinitializer
%b = bitcast <8 x i32> %s to <4 x i64>		%b = bitcast <8 x i32> %s to <4 x i64>
ret <4 x i64> %b		ret <4 x i64> %b
}		}