This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Change default legalization behavior of v1i32 to be widen to v2i32 instead of scalarization
ClosedPublic

Authored by HaoLiu on Jun 27 2014, 2:19 AM.

Details

Summary

Hi Tim and other reviewers,

Recently we found an assertion failure caused by scalarizing node like "v1i32 trunc v1i64". As v1i64 is legal, it will fail to scalarize v1i32.

This patch adds two hooks (like the existing hook shouldSplitVectorType()) to change the scalarization behavior:
(1) avoid scalarize v1i32 by shouldScalarizeVectorType().
(2) If avoid scalarization, the default behavior is changed to be promoted from v1i32 to v1i64, which may introduce extra operations like extend, truncate... Hook preferWidenVectorType() can change it to be widen from v1i32 to v2i32.

v1i8 and v1i16 are similar to v1i32. This patch can also have benefit on reducing FMOV.

BTW, I also think it's better to widen v2i8, v4i8, v2i16 to v8i8, v8i8, v4i16 than the default behavior of promote. But such change will cause several regression test case failures. Some of the test cases result in better code, but some of the test cases result in worse code. If we can improve the worse case, we will also have benefit.

Code review. Please.

Thanks,
-Hao

Diff Detail

Event Timeline

HaoLiu updated this revision to Diff 10918.Jun 27 2014, 2:19 AM
HaoLiu retitled this revision from to [AArch64] Change default legalization behavior of v1i32 to be widen to v2i32 instead of scalarization .
HaoLiu updated this object.
HaoLiu edited the test plan for this revision. (Show Details)
HaoLiu added a reviewer: t.p.northover.
HaoLiu added a subscriber: Unknown Object (MLST).

ping...

This issue is causing our daily test case failed. Hope it can be fixed ASAP.

Thanks,
-Hao

t.p.northover edited edge metadata.Jul 1 2014, 8:52 AM

Hi Hao,

This seems to leave us with a rather ugly mess of callbacks that cover all possible actions, at least on vector types (Scalarize, Split, Widen, Promote). There must be a tidier way to express the logic, both in Targets and in TargetLoweringBase.

If the callback was TargetLowering::getVectorAction instead, we could almost use a switch like this, with fallthroughs (provided v1iN is set to scalarize by default):

switch (getPreferredVectorAction()) {
case TypeWidenVector:
  // ...
  // FALLTHROUGH
case TypePromoteVector:
  // ...
  // FALLTHROUGH
case TypeSplitVector:
  // ...
  break;
case TypeScalarizeVector:
  // ...
  break;
}

I'm not sure I like the asymmetry there, but something must be possible.

The special accommodation later in the function for non-pow2 MVTs also seems completely redundant, at least now: there aren't any.

Cheers.

Tim.

HaoLiu updated this revision to Diff 11023.Jul 2 2014, 3:35 AM
HaoLiu edited edge metadata.

Hi Tim,

Your suggestion is very good.
I've refactored the patch by using one hook to control all the legalization actions.

Thanks,
-Hao

Hi Hao,

This looks reasonable now; I've just got some minor nits about the implementation of the callbacks. LGTM otherwise.

Cheers.

Tim.

lib/Target/AArch64/AArch64ISelLowering.cpp
7914–7916

Could you change this to "return TargetLoweringBase::getPreferredVectorAction(VT)"? It makes it clearer we're not overriding anything else.

lib/Target/NVPTX/NVPTXISelLowering.cpp
478–481 ↗(On Diff #11023)

Similarly, I think these might be clearer as "if (VT.getVectorNumElements() != 1 && VT.getScalarType() == MVT::i1)" followed by deferring to the base implementation.

arsenm added a subscriber: arsenm.Jul 2 2014, 1:51 PM
arsenm added inline comments.
lib/Target/R600/SIISelLowering.cpp
274–280 ↗(On Diff #11023)

I think this should be TypeSplitVector for any vector > 128 bits, and TypePromoteInteger for scalar types < 32-bits and no more than 4 elements. I can take care of that later though

Sorry if I'm stepping on any toes, but I'm going to commit this (having made Tim's requested changes) so that myself and some others can do stuff on top of this really excellent hook in various other backends.

Thanks so much to Hao for working on this and with Tim for building some really nice infrastructure. It's kind of amazing how many problems in different backends this will likely help us address!

lib/Target/AArch64/AArch64ISelLowering.cpp
7914–7916

FYI, I did this.

lib/Target/NVPTX/NVPTXISelLowering.cpp
478–481 ↗(On Diff #11023)

And this.

lib/Target/R600/SIISelLowering.cpp
274–280 ↗(On Diff #11023)

Similarly to other targets, I switched this to delegate to the base class. I'll let you add the custom logic you want for R600 in a follow-up patch.

chandlerc accepted this revision.Jul 2 2014, 5:31 PM
chandlerc added a reviewer: chandlerc.
This revision is now accepted and ready to land.Jul 2 2014, 5:31 PM

Hi Chandler,

I'm fine with your commit. Thanks for your work.

Thanks,
-Hao

ab added a subscriber: ab.Oct 15 2014, 4:18 PM

Hi Hao, Chandler, and Tim,

This seems to be "causing" PR20778 : by widening v1i{32,16,8}, this pushes the original problem (scalarizer asserting because the trunc operand wasn't scalarized) to v1i1.
There was a similar problem for setcc+vselect, cf. r205625.
From what I see, there are a variety of possible solutions:

1- The obvious way is to also widen v1i1. However, widening wouldn't result in any legal type.

2- Promote v1i1, to v1i64. This produces a few (correct) changes (visible in test/CodeGen/AArch64/aarch64-neon-v1i1-setcc.ll).
Also, there is a long-standing FIXME regarding the issue:

// FIXME: Currently the type legalizer can't handle VSELECT having v1i1 as
// condition. If it can legalize "VSELECT v1i1" correctly, no need to combine
// such VSELECT.

Promoting v1i1 somewhat fixes this: we still need combining, but to avoid using v1i64 instead of i64 for vselect+setcc.

3- Change the generic legalizer logic to not assert, when an operation's operands haven't been legalized before (in this case the scalarizer expecting v1i64 to be transformed to i64). This was also done in r205625, but is a pretty ugly hack.

4- Make v1i64 not legal. I'm not sure why it is legal in the first place: why not scalarize to i64? This doesn't seem like a trivial change though. However, making v1i64 and v1f64 non-legal would cleanup a few parts of the AArch64 backend.

Also, a general problem with most of these is that v1i1 isn't a SimpleVT, and most of the legalizer logic is based on (enum SimpleVT)-indexed arrays. Adding v1i1 doesn't make perfect sense (it's not really legal on any target), but it seems sensible enough.

I have a few patches to implement the promotion, but that's problematic, because it then breaks the legalizer on this kind of input:

%rhs_t = trunc <1 x i16> %rhs to <1 x i1>

What do you think? I can put my experimental patches for #2 on phabricator if desired.

Thanks!
-Ahmed

FWIW, I'd rather not use promotion here.

My inclination is to widen where doing so produces a legal type (and we
really need to have the ability to represent vNi1s as legal types in x86
even if we don't in AArch64), and scalarizing due to type illegality when
we can't widen to a legal type.

I suppose I would be OK doing promotion *once widening fails to find a
legal type* as a lowering? But from your description, it isn't clear that
this helps.

I actually agree that having v1 vector types be legal seems quite strange.
It would seem better to avoid that long-term if possible, and so I'd vote
for moving in a direction that at least doesn't make that harder?

Hi Ahmed,

I think we have to promote v1i1 to v1i64 for AArch64 backend. As we
(1) can not widen it to any vNi1 types, as there isn't any vNi1 legal.
(2) can not scalarize it. Even if we add logic to scalarize sitofp, the GetScalarizedVector method still assumes all operands have been scalarized. But as v1i64 is legal, it will never scalarize this operand. So it will result in an assertion failure in GetScalarizedVector method.
(3) can not split it, as it only has one element.

So the only choice seems to be promoting it to v1i64 in AArch64 backend.

For the issue about

%rhs_t = trunc <1 x i16> %rhs to <1 x i1>

I don't try promoting v1i1 to v1i64, so I don't know the root cause. I just think we can still fix this.

Thanks,
-Hao

ab added a comment.Oct 23 2014, 9:46 AM

I agree with Chandler, promotion isn't ideal, and from my quick experiments, it creates other problems by solving this one.
I guess we'll have to resort to working around the issue in the legalizer (Hao's solution #2, my #3, and the same as r205625).
See: http://reviews.llvm.org/differential/diff/15333/

Are you both fine with that?

(Also, I noticed a mistake in my comment: the problem is PR20777, not 778.)

-Ahmed

In D4322#23, @ab wrote:

I agree with Chandler, promotion isn't ideal, and from my quick experiments, it creates other problems by solving this one.
I guess we'll have to resort to working around the issue in the legalizer (Hao's solution #2, my #3, and the same as r205625).
See: http://reviews.llvm.org/differential/diff/15333/

Are you both fine with that?

Hi Ahmed,

Just one point about the following comments:

// For instance, this happens on AArch64, v1i1 is illegal but v1i{8,16,32}
// are widened to v1i64, which is legal.

v1i8, v1i16, v1i32 are widened to v8i8, v4i16, v2i32. If a type is legalized to v1i64, it is promoted.

I'm fine with that patch. But as I'm not quite familiar type legalize, it would be good to have someone expert like Chandler to also look it over.

Thanks,
-Hao

ab added a comment.Oct 30 2014, 4:59 PM

Right, I meant "widened to 64-bit vectors of the same scalar type"; I corrected that comment and committed the workaround as r220937.

I'm open to alternative solutions!

Thanks,

  • Ahmed
HaoLiu closed this revision.May 13 2015, 7:53 PM