Page MenuHomePhabricator

[CodeGen] Add hooks/combine to form vector extloads, and enable it on X86.

Authored by ab on Jan 9 2015, 1:08 PM.



Moved from the ML:

As the last change in my extload series, here are 3 squashed (WIP) patches
to actually form extloads on vector types.
They used to be disabled, because "None of the supported targets knows
how to perform load and sign extend on vectors in one instruction."

The first part enables the combine on legal vectors, but hides it
behind a profitability callback.
For instance, on ARM, several instructions have folded extload forms,
so it's not always beneficial to create an extload node (and trying to
match extloads is a whole 'nother can of worms).

The second part adds a combine to fold s/zextloads of illegal
(splittable) vector types, to replace it directly by multiple smaller
extloads. I'm not a big fan of this kind of pseudo-legalization in
combines. However, I tried the alternative: form illegal extloads, and
later try to split them up, but then, you sometimes generate extloads
that can't be split up, but have a valid ext+load expansion. At
vector-op legalization time, it's too late to generate this kind of
thing, so you end up forced to scalarize. It's better to just avoid
creating egregiously illegal nodes.

Finally, the last part enables this all, unconditionally, on X86.

Note that the splitting combine is happy with "custom" extloads. As
is, this bypasses the actual custom lowering, and just unrolls the
extload. But from what I've seen, this is still much better than the
current custom lowering, which does some kind of unrolling at the end
anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added

Also note that there's a regression in widen_load-2.ll, where
we can no longer fold the load. I'll look into that later.

Anyway: as can be seen from the nice testcase cleanups, there's
something to be done here. The combines feel a bit dirty, but I don't
see a better alternative. Finally, I didn't see changes on the
testsuite (SSE2 X86-64, I'll try SSE4.1 and AVX2 as well.)

P.S.: I squashed all three because I don't think it makes it harder to
review, just longer. The three changes are nicely isolated, and will
be committed separately. I'll split into 3 threads if desired.

Depends on D6533.


Diff Detail


Event Timeline

ab updated this revision to Diff 17942.Jan 9 2015, 1:08 PM
ab retitled this revision from to [CodeGen] Add hooks/combine to form vector extloads, and enable it on X86..
ab updated this object.
ab edited the test plan for this revision. (Show Details)
ab added a reviewer: chandlerc.
ab added a subscriber: Unknown Object (MLST).
arsenm added a subscriber: arsenm.Jan 9 2015, 1:36 PM
arsenm added inline comments.
5334–5335 ↗(On Diff #17942)

I don't think this will work correctly for 3 vectors, but it probably doesn't matter

5336 ↗(On Diff #17942)

This should use .getStoreSize()

5671–5682 ↗(On Diff #17942)

This looks like the same in the sextload handling. Should probably be moved to a new function

5688 ↗(On Diff #17942)

This should also use getStoreSize()

5694–5698 ↗(On Diff #17942)

It might be helpful to add a new version of getExtLoad based on the version that takes the MMO and an offset rather than having to pass all of the separate arguments

20109 ↗(On Diff #17942)

Missing override

chandlerc edited edge metadata.Jan 9 2015, 1:40 PM

Wow this looks amazing. I'm glad you were able to make it all the way through to fixing this properly (I wasn't when I looked here, I ran away screaming).

Some comments inline...

1512 ↗(On Diff #17942)

Do we you "ExtLd" in APIs elsewhere? this kind of extreme abbreviation, especially if deviating from other abbreviations, makes the APIs really hard to understand and find.

5660 ↗(On Diff #17942)

This code is extremely similar to the sext case. Can it be factored out with an appropriate predicate or input instruction tag or something?

5665–5671 ↗(On Diff #17942)

Even if you can't put this into a common function with sext, at least pull both into their own functions so that you can use early exit for all of this rather than boolean variables.

76–79 ↗(On Diff #17942)

This is both weird and unfortunate... Do understand why this happens?

ab updated this revision to Diff 18019.Jan 12 2015, 8:23 AM
ab edited edge metadata.
ab added inline comments.
1512 ↗(On Diff #17942)

I agree, and originally wanted to use the closer-to-ubiquitous LoadExt. However, there's enableExtLdPromotion, which is the most similar API so should be similarly named.

I guess I could rename both to LoadExt instead, how does that sound? That would stay consistent with ISD::LoadExtType, and the various *LoadExt* legality functions.

5334–5335 ↗(On Diff #17942)

You're right, I added a check for isPow2VectorType.

5660 ↗(On Diff #17942)

It should be trivial to factor it out. Other similar combines as well (I'll take care of it separately.)

5694–5698 ↗(On Diff #17942)

I thought so as well, but two things would make it confusing IMO: 1) the offset operand is for indexed loads only, which is a concept different from an ADDed base pointer. Also, less importantly, 2) we're not sure we can reuse the previous MMO, as the types may differ.
Or maybe you meant something else?

76–79 ↗(On Diff #17942)

This was actually a rebasing mistake, and doesn't happen on a clean apply of the patch. Sorry for the noise.

chandlerc added inline comments.Jan 12 2015, 9:24 AM
1512 ↗(On Diff #17942)

We're so inconsistent here that I don't really care about what the old code happened to name things provided we're consistent going forward and converge on that. Given that, I'd prefer the minimal amount of abbreviation that is sane, and I think spelling out "load" but abbreviating "ext" is a fine compromise there.

5660 ↗(On Diff #17942)

Please don't add duplicated code and then factor it afterward.

Either do the minimal amount of factoring in this change so that we don't have to review 2x the code, or do the factoring first, and then the functional change.

ab added a comment.Jan 20 2015, 1:44 PM



hfinkel added inline comments.
5227 ↗(On Diff #18019)

Please make this comment more verbose. We might generate multiple loads and then combine them together, and it is not clear here whether this combining happens to the results of the sext or the sext's operand.

5274 ↗(On Diff #18019)

Is this alignment correct? Aren't these loads smaller than the original one?

My only comments beyond what Hal has already mentioned are real nits:


Can you use DL everywhere? It's odd that in one place you use SDLoc(N0). It may be correct, but if so, it's likely not obvious to the reader why we need to do something different here.

Finally, I'd find the indentation more consistent with clang-format. But now we're waaay down into minor details. This is generally looking much better.

ab updated this revision to Diff 19057.Jan 30 2015, 2:00 PM
  • Use smaller alignment for the split loads.
  • Improve formatting, names.
  • Don't accept ANYEXTs, since it's not testable or supported yet.
hfinkel accepted this revision.Feb 2 2015, 7:24 PM
hfinkel added a reviewer: hfinkel.


This revision is now accepted and ready to land.Feb 2 2015, 7:24 PM
This revision was automatically updated to reflect the committed changes.