This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) ->
(extract_vector_elt v, i). The extend is unnecessary since it is performed
implicitly by the extract.
Details
Diff Detail
Event Timeline
lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
3809 | Any idea why we don't have (anyext (vector_extract (...) ) -> vector_extract ( )? I see that vector_extract is EXTRACT_VECTOR_ELT, which according to ISDOpcodes.h should be able to perform the anyext operation. If that would happen, wouldn't one of the patterns above match this? |
Thanks very much for the quick reply, Silviu! Please see my response inline.
lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
3809 | I see what you're getting at, but as far as I know (please correct me if I'm wrong), "vector_extract" can't be used in an output pattern. |
lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
3809 | I think what Silvu is getting at is that maybe a better place to fix this would be to add a dag-combine that did (anyext (vector_extract (x)) -> vector_extract(x) (with the wider type). |
Reimplemented change as a DAG combine following feedback from Silviu and Geoff.
I first attemted to make this a target-independent combine, but that broke a lot of X86 tests. If we like the approach, I think it's best to limit the combine to AArch64 for now.
lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
8444 ↗ | (On Diff #42932) | I suppose we need the sign extend because ISD::EXTRACT_VECTOR_ELT gets matched to "smov"? Perhaps that's something that needs changing - surely we should match (sign_ext_inreg (extract_vector_elt)) to smov and (zero_ext_inreg (extract_vector_elt)) to umov. I think there needs to be more type checking going on here. Consider: %0 = EXTRACT_VECTOR_ELT <8 x i16> %a ; %0 is i16 %1 = ANYEXT i16 %0 to i32 ; %1 is i32 %2 = SIGN_EXTEND_INREG i32 %0 (from i8 to i32) ; bit[7] is replicated into bit[8..31] This cannot be an SMOVv8i16, because the sign bit is bit[7], not bit[15]. This testcase shows it: define i64 @matt(<8 x i16> %a) { %b = extractelement <8 x i16> %a, i32 1 %c = trunc i16 %b to i8 %d = sext i8 %c to i64 ret i64 %d } SelectionDAG has 11 nodes: t0: ch = EntryToken t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %vreg0 t14: i32 = extract_vector_elt t2, Constant:i64<1> t15: i64 = any_extend t14 t13: i64 = sign_extend_inreg t15, ValueType:ch:i8 t9: ch,glue = CopyToReg t0, Register:i64 %X0, t13 t10: ch = AArch64ISD::RET_FLAG t9, Register:i64 %X0, t9:1 |
lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
8444 ↗ | (On Diff #43010) | James, You're right. Thanks for joining the review! We also need to check that the vector element type is the same as the type we're sign extending from. Yes, this combine is intended to canonicalize the patterns we match to SMOV. The existing patterns attempt match (sext_inreg (vector_extract v, i)) for the legal types. However, the 16xi8-to-i64 case and the 8xi16-to-i64 case (which we have patterns for) don't actually follow this form because there is an added any_extend. An any_extend from i32 to i64 is added by the type legalizer because it legalizes the extracted elements to i32. So for example, for the case below: define i64 @f(<16 x i8> %a) { entry: %e = extractelement <16 x i8> %a, i32 2 %b = zext i8 %e to i64 ret i64 %b } we currently generate: umov w8, v0.b[2] sxtb x0, w8 With this DAG combine, we will instead generate the following code because the existing 16xi8-to-i64 pattern will fire. smov x0, v0.b[2] |
lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
8444 ↗ | (On Diff #43010) | Sorry, that should have been a sext in the test case above. |
James, you may need to go into Phabricator and unblock the patch.
From: James Molloy [mailto:james@jamesmolloy.co.uk]
Sent: Wednesday, December 16, 2015 11:55 AM
To: reviews+D15515+public+f270c3ee04f7413b@reviews.llvm.org; Matthew Simpson; t.p.northover@gmail.com; silviu.baranga@arm.com; renato.golin@linaro.org; mcrosier@codeaurora.org; gberry@codeaurora.org; james.molloy@arm.com
Cc: kanheim@a-bix.com; llvm-commits@lists.llvm.org
Subject: Re: [PATCH] D15515: [AArch64] Add DAG combine for extract extend pattern
That makes sense to me. LGTM!
Thanks very much Silviu, James, Geoff, and Chad for the feedback.
I reverted this patch because it was breaking some internal tests. After looking at the failing test case, I don't think the DAG is the right place to do this. The issue in the failing test was that the sign_extend was first matched to another pattern (a widening operation like ssubw). Without the sign_extend, the widened extract_vector_elt could no longer be matched to anything. See the code below for an example.
define i64 @f(<8 x i16> %a, i64 %b) { %d = extractelement <8 x i16> %a, i32 2 %e = sext i16 %d to i64 %t1 = sub nsw i64 %b, %e ret i64 %t1 }
I think the initial revision of the patch, which added the two additional patterns to the target description, is the safer approach. We'll match the smov cases, but won't risk making the DAG illegal. Were there any objections to the initial revision?
If I understand correctly, you may be able to just add patterns for the vector_extracts without extends/ands to handle this case (along with the current dagcombine).
I'm not sure this will buy you anything more than just using the original patterns you suggested though.
Any idea why we don't have (anyext (vector_extract (...) ) -> vector_extract ( )?
I see that vector_extract is EXTRACT_VECTOR_ELT, which according to ISDOpcodes.h should be able to perform the anyext operation. If that would happen, wouldn't one of the patterns above match this?