Canonicalize the case when a scalar extracted from a vector is
truncated. Transform such cases to bitcast-then-extractelement.
This will enable erasing the truncate operation.
This commit fixes PR45314.
Differential D76983
[InstCombine] Transform extractelement-trunc -> bitcast-extractelement dsprenkels on Mar 28 2020, 4:53 AM. Authored by
Details Canonicalize the case when a scalar extracted from a vector is This commit fixes PR45314.
Diff Detail
Unit Tests Event TimelineComment Actions It was a bit unclear to me how involved the tests should be for this patch. At this time, I kept them pretty minimal, but I could add more if that's desired. Comment Actions I didn't look at the logic closely, but seems to be on the right track from the tests (feel free to include Alive2 links in the review if you tested any/all of these). ; In IR, crazy types are allowed. define i13 @shrinkExtractElt_i67_to_i13_2(<3 x i67> %x) { %e = extractelement <3 x i67> %x, i459 2 %t = trunc i67 %e to i13 ret i13 %t } ; We generally don't want to canonicalize to a form that increases the instruction count. declare void @use(i64) define i16 @shrinkExtractElt_i64_to_i16_2_extra_use(<3 x i64> %x) { %e = extractelement <3 x i64> %x, i64 2 call void @use(i64 %e) %t = trunc i64 %e to i16 ret i16 %t } I didn't run those through 'opt', so check for typos.
Comment Actions Do not canonicalize if the extractvector operand has other users. Comment Actions
I added these cases to the patch. For the case of the crazy types I added a check that requires the respective sizes of the vector and the truncated value to be compatible. This should prevent against an invalid bitcast being created. However, I still kinda allow really crazy types. Would it be better to just disable this canonicalization for types that don't look nice ? (Are "nice" types somehow even defined?) Some examples that I checked with Alive2: Little endian - <3 x i64> to i32: http://volta.cs.utah.edu:8080/z/tt-qpe - <3 x i64> to i32: http://volta.cs.utah.edu:8080/z/KAkFBh (idx=1) - <3 x i64> to i16: http://volta.cs.utah.edu:8080/z/EqzJSY (idx=2) Big endian: - <3 x i64> to i32: http://volta.cs.utah.edu:8080/z/FRbHAA - <3 x i64> to i32: http://volta.cs.utah.edu:8080/z/oFvAdt (idx=1) - <3 x i64> to i16: http://volta.cs.utah.edu:8080/z/yB7Hng (idx=2) Comment Actions See inline for a few code nits, and I'd make some test changes:
define <4 x i64> @PR45314(<4 x i64> %x) { %e = extractelement <4 x i64> %x, i32 0 %t = trunc i64 %e to i32 %i = insertelement <8 x i32> undef, i32 %t, i32 0 %s = shufflevector <8 x i32> %i, <8 x i32> undef, <8 x i32> zeroinitializer %b = bitcast <8 x i32> %s to <4 x i64> ret <4 x i64> %b }
Comment Actions That's similar to @lebedev.ri 's question about legality. We have helpers that look at the data-layout to decide if a type is target-friendly/legal: bool InstCombiner::shouldChangeType(unsigned FromWidth, unsigned ToWidth) bool InstCombiner::shouldChangeType(Type *From, Type *To); But that doesn't work with vector types (...because the data-layout doesn't specify vector types AFAIK). In this transform, we are not creating any new type except for a bitcast of a vector type. So I don't think we want to limit the transform. Targets/codegen should be able to deal with bitcasts of vector types. Comment Actions @spatel Thanks for the review. I will soon look into it!
I don't think I have the permissions do push directly to master. Is this easily fixed? Or should I create a separate differential for this?
Comment Actions My git lingo might be off here. Are you saying you don't have commit permissions for LLVM yet? If not, then yes please create a separate Phab review and someone will push that for you.
Comment Actions The new preliminary tests have been submitted in https://reviews.llvm.org/D77024. I removed the test for now and will submit another update when the other diff has been committed. I also updated the diff in line with the inline comments. Comment Actions
Yup. Should I ask for it? (I don't know what the etiquette is; i.e. how many patches before one should request commit access.)
Comment Actions I don't see it stated explicitly anywhere, but I think 2-3 functional patches usually qualifies as a good "track record". If you have that already, then please request. If not, no problem - I can commit on your behalf. Comment Actions I think we're pretty close now.
Comment Actions
Comment Actions Let me look at the failed test in a couple of hours. Btw. I have commit permissions now, so I will be able to commit this myself now. :) Comment Actions Great! A good first commit / NFC patch will update this old test file: Comment Actions
Should I explicitly add the data layout in ExtractCast.ll, because the added assertions are only valid on little endian platforms? Comment Actions I don’t see a need for that. That test is practically identical to the tests we added explicitly for this transform. Comment Actions This patch triggers a regression on our side: For the following code: define dso_local <4 x i16> @truncate_v_v(<4 x i32> %lhs) #0 { entry: %vecext = extractelement <4 x i32> %lhs, i32 0 %conv = trunc i32 %vecext to i16 %vecinit = insertelement <4 x i16> undef, i16 %conv, i32 0 %vecext1 = extractelement <4 x i32> %lhs, i32 1 %conv2 = trunc i32 %vecext1 to i16 %vecinit3 = insertelement <4 x i16> %vecinit, i16 %conv2, i32 1 %vecext4 = extractelement <4 x i32> %lhs, i32 2 %conv5 = trunc i32 %vecext4 to i16 %vecinit6 = insertelement <4 x i16> %vecinit3, i16 %conv5, i32 2 %vecext7 = extractelement <4 x i32> %lhs, i32 3 %conv8 = trunc i32 %vecext7 to i16 %vecinit9 = insertelement <4 x i16> %vecinit6, i16 %conv8, i32 3 ret <4 x i16> %vecinit9 } The tests expects to see: define dso_local <4 x i16> @truncate_v_v(<4 x i32> %lhs) local_unnamed_addr #0 { entry: %0 = trunc <4 x i32> %lhs to <4 x i16> ret <4 x i16> %0 } which, in machine instructions, is mapped onto a vector trunc instruction. But now, we see: define dso_local <4 x i16> @truncate_v_v(<4 x i32> %lhs) local_unnamed_addr #0 { entry: %0 = bitcast <4 x i32> %lhs to <8 x i16> %vecinit9 = shufflevector <8 x i16> %0, <8 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6> ret <4 x i16> %vecinit9 } which is expanded into a large sequence of code going through the stack. Comment Actions I agree. We hit a phase ordering difference - SLP can reduce the chain of insert/extract to a vector trunc, but it doesn't handle the shuffle-of-bitcast. The open question is where to implement that transform. We're on the edge of instcombine vs. vector-combine if we want to do this in IR. Ie, is there consensus that forming a size-changing vector cast from a shuffle is canonical? Comment Actions
I would have guessed it is, yes.
Comment Actions Agree - the trunc is better for analysis, and a quick check of various backends says we do worse at codegen of the shuffle than the trunc, so that's more likely to be the expected form. |
I don't think we use the triple-slash documentation comment within a function. Either change to the standard "//" or make a helper function with the doxygen comment (similar to "foldVecTruncToExtElt" just above here).