This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Fix a pattern match failure caused by creating improper CONCAT_VECTOR.
Needs ReviewPublic

Authored by kevin.qin on Jun 9 2014, 7:06 PM.

Details

Reviewers
t.p.northover
Summary

Hi Tim,

This patch is to fix a pattern match failure when handling IR like,
%tmp2 = fptosi <4 x double> %tmp1 to <4 x i16>

Failure reason is, ReconstructShuffle() wrongly created a CONCAT_VECTOR trying to concat 2 of v2i32 into v4i16. This patch can fix this issue and try to generate UZP1 instead of lots of MOV and INS. Please review.

Diff Detail

Event Timeline

kevin.qin updated this revision to Diff 10259.Jun 9 2014, 7:06 PM
kevin.qin retitled this revision from to [AArch64] Fix a pattern match failure caused by creating improper CONCAT_VECTOR..
kevin.qin updated this object.
kevin.qin edited the test plan for this revision. (Show Details)
kevin.qin added a reviewer: t.p.northover.
kevin.qin set the repository for this revision to rL LLVM.
kevin.qin added a subscriber: Unknown Object (MLST).
t.p.northover edited edge metadata.Jun 16 2014, 3:09 AM

Hi Kevin,

Sorry it's taken me so long to get around to this one; it's rather a complicated patch (even in comparison to the surrounding code) and I wanted to make sure I did it justice.

I've got a few questions (at the end, as with usual Phab stuff)...

Cheers.

Tim.

lib/Target/AArch64/AArch64ISelLowering.cpp
4184–4185

Isn't the key type here VT.getVectorElementType()? I'm not sure I see the logic of caring about any AssertSext/AssertZext input: we're just going to be discarding the bits anyway in your example.

In the reverse case (say extracting from v4i16 and inserting into v4i32), as far as I can see EXTRACT_VECTOR_ELT will basically be doing an anyext, so we don't have to care there either.

Or do you have examples where this kind of check is needed?

4201–4202

This seems very analogous to the VExtOffsets vector, but is handled completely differently.

Instead of mangling the actual BUILD_VECTOR node, perhaps we should create a similar OffsetMultipliers (say) variable and just record what we've done for later in the function.

4225

If this entire section of added code is moved to the start of the "for (unsigned i = 0; i < SourceVecs.size(); ++i)" loop, this block becomes redundant, we can just use the existing one at line 4176.

Hi Tim,

I have to admit this patch is a little complex to understand. I try to summarize what I did in this patch from the view of what I want to solve.

lib/Target/AArch64/AArch64ISelLowering.cpp
4182–4183

This condition check is moved from below to here, because without this check, for dags like(This can be created from lowering 'tmp2 = fptosi <4 x double> %tmp1 to <4 x i16>'),

A: v4i16 = build_vector B, C,D, E
  B: i32 = extract_vector_elt F, lane 0
    F: v2i32 = AssertSext v2i32, type i16
  C: i32 = extract_vector_elt F, lane 1
    F: v2i32 = AssertSext v2i32, type i16
  D: i32 = extract_vector_elt G, lane 0
    G: v2i32 = AssertSext v2i32, type i16
  E: i32 = extract_vector_elt G, lane 1
    G: v2i32 = AssertSext v2i32, type i16

That build_vector node will be lowering to a concat_vector combining 2 of v2i32 into v4i16, which is pattern match failed.

4184–4185

If input cannot pass element type check, we can simply get function return and the incorrect concat_vector would not be generated. But from the result, I saw lots of MOV and INS instructions are generated for this build_vector. To get better result, I added additional codes here specically for AssertSext /AssertZext, wishing to generate single UZIP1 instead. For AssertSext /AssertZext, only low bits of each element are valid, so for a v2i32 AssertSext node asserting holds i16, it can bitcast to v4i16, which lane 0, 2 are matching prior lanes and lane 1, 3 are undefined. So all lane numbers extracting on AssertSext node should adjust by multiplying 2. After that, this kind of build_vector can reconstruct to shuffle_vector and get single UZIP2 at last.

4201–4202

You mean rename Pow to OffsetMultipliers? I will do that in next version.

4225

Agreed.

kevin.qin updated this revision to Diff 10477.Jun 16 2014, 10:49 PM
kevin.qin edited edge metadata.

Here is updated patch. Please review again. Thanks.

Hi Kevin,

Isn't the key type here VT.getVectorElementType()? I'm not sure
I see the logic of caring about any AssertSext/AssertZext input:
we're just going to be discarding the bits anyway in your example.

In the reverse case (say extracting from v4i16 and inserting into
v4i32), as far as I can see EXTRACT_VECTOR_ELT will basically
be doing an anyext, so we don't have to care there either.

Or do you have examples where this kind of check is needed?

If input cannot pass element type check, we can simply get function
return and the incorrect concat_vector would not be generated.

Fair enough, *something* clearly has to be done; I'm not disputing that.

But
from the result, I saw lots of MOV and INS instructions are generated
for this build_vector. To get better result, I added additional codes here
specically for AssertSext /AssertZext, wishing to generate single UZIP1
instead. For AssertSext /AssertZext, only low bits of each element are
valid, so for a v2i32 AssertSext node asserting holds i16, it can bitcast
to v4i16, which lane 0, 2 are matching prior lanes and lane 1, 3 are
undefined.

I think that works regardless of whether the AssertSext is present or
not. All the AssertZext and AssertSext nodes tell us is that the lanes
we *don't* care about are going to be either 0 or -1 (and that's
assuming they match up with the vector we're building, they're even
less useful otherwise).

Instead of mangling the actual BUILD_VECTOR node, perhaps we should
create a similar OffsetMultipliers (say) variable and just record what we've
done for later in the function.

You mean rename Pow to OffsetMultipliers? I will do that in next version.

I mean something a bit bigger: completely remove the
"DAG.getNode(ISD::BUILD_VECTOR, ...)" code and instead save the
information needed to extract the correct lanes later, in an "int
OffsetMultipliers[2] = { 1, 1 };" variable.

Purely for discussion purposes, I've attached a quick hatchet-job I've
done on the patch along the lines I'm suggesting (clearly unpolished).
It gets the test you wrote correct, without so many special cases. I
assume you had something similar before you settled on using
AssertZext and AssertSext, do you have an example of why it's not the
right solution?

Cheers.

Tim.

Hi Tim,

After your modification, codes become cleaner and should execute faster. Thanks for this refactoring! I've tried more tests on this patch, showing no regression. As I have no suggestions on code changes any more, can I commit your version directly?

Hi Tim,

I've commited it after adding some comments. It's r211144.

Cheers.
Kevin

2014-06-17 16:58 GMT+08:00 Tim Northover <t.p.northover@gmail.com>:

Hi Kevin,

http://reviews.llvm.org/D4080