This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Add hooks/combine to form vector extloads, and enable it on X86.
ClosedPublic

Authored by ab on Jan 9 2015, 1:08 PM.

Details

Summary

Moved from the ML:

As the last change in my extload series, here are 3 squashed (WIP) patches
to actually form extloads on vector types.
They used to be disabled, because "None of the supported targets knows
how to perform load and sign extend on vectors in one instruction."

The first part enables the combine on legal vectors, but hides it
behind a profitability callback.
For instance, on ARM, several instructions have folded extload forms,
so it's not always beneficial to create an extload node (and trying to
match extloads is a whole 'nother can of worms).

The second part adds a combine to fold s/zextloads of illegal
(splittable) vector types, to replace it directly by multiple smaller
extloads. I'm not a big fan of this kind of pseudo-legalization in
combines. However, I tried the alternative: form illegal extloads, and
later try to split them up, but then, you sometimes generate extloads
that can't be split up, but have a valid ext+load expansion. At
vector-op legalization time, it's too late to generate this kind of
thing, so you end up forced to scalarize. It's better to just avoid
creating egregiously illegal nodes.

Finally, the last part enables this all, unconditionally, on X86.

Note that the splitting combine is happy with "custom" extloads. As
is, this bypasses the actual custom lowering, and just unrolls the
extload. But from what I've seen, this is still much better than the
current custom lowering, which does some kind of unrolling at the end
anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added
FIXME).

Also note that there's a regression in widen_load-2.ll, where
we can no longer fold the load. I'll look into that later.

Anyway: as can be seen from the nice testcase cleanups, there's
something to be done here. The combines feel a bit dirty, but I don't
see a better alternative. Finally, I didn't see changes on the
testsuite (SSE2 X86-64, I'll try SSE4.1 and AVX2 as well.)

P.S.: I squashed all three because I don't think it makes it harder to
review, just longer. The three changes are nicely isolated, and will
be committed separately. I'll split into 3 threads if desired.

Depends on D6533.

Thanks,
-Ahmed

Diff Detail

Event Timeline

ab updated this revision to Diff 17942.Jan 9 2015, 1:08 PM
ab retitled this revision from to [CodeGen] Add hooks/combine to form vector extloads, and enable it on X86..
ab updated this object.
ab edited the test plan for this revision. (Show Details)
ab added a reviewer: chandlerc.
ab added a subscriber: Unknown Object (MLST).
arsenm added a subscriber: arsenm.Jan 9 2015, 1:36 PM
arsenm added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5334–5335

I don't think this will work correctly for 3 vectors, but it probably doesn't matter

5336

This should use .getStoreSize()

5671–5682

This looks like the same in the sextload handling. Should probably be moved to a new function

5688

This should also use getStoreSize()

5694–5698

It might be helpful to add a new version of getExtLoad based on the version that takes the MMO and an offset rather than having to pass all of the separate arguments

lib/Target/X86/X86ISelLowering.cpp
20109

Missing override

chandlerc edited edge metadata.Jan 9 2015, 1:40 PM

Wow this looks amazing. I'm glad you were able to make it all the way through to fixing this properly (I wasn't when I looked here, I ran away screaming).

Some comments inline...

include/llvm/Target/TargetLowering.h
1512

Do we you "ExtLd" in APIs elsewhere? this kind of extreme abbreviation, especially if deviating from other abbreviations, makes the APIs really hard to understand and find.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5660

This code is extremely similar to the sext case. Can it be factored out with an appropriate predicate or input instruction tag or something?

5665–5671

Even if you can't put this into a common function with sext, at least pull both into their own functions so that you can use early exit for all of this rather than boolean variables.

test/CodeGen/X86/widen_load-2.ll
76–79 ↗(On Diff #17942)

This is both weird and unfortunate... Do understand why this happens?

ab updated this revision to Diff 18019.Jan 12 2015, 8:23 AM
ab edited edge metadata.
ab added inline comments.
include/llvm/Target/TargetLowering.h
1512

I agree, and originally wanted to use the closer-to-ubiquitous LoadExt. However, there's enableExtLdPromotion, which is the most similar API so should be similarly named.

I guess I could rename both to LoadExt instead, how does that sound? That would stay consistent with ISD::LoadExtType, and the various *LoadExt* legality functions.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5334–5335

You're right, I added a check for isPow2VectorType.

5660

It should be trivial to factor it out. Other similar combines as well (I'll take care of it separately.)

5694–5698

I thought so as well, but two things would make it confusing IMO: 1) the offset operand is for indexed loads only, which is a concept different from an ADDed base pointer. Also, less importantly, 2) we're not sure we can reuse the previous MMO, as the types may differ.
Or maybe you meant something else?

test/CodeGen/X86/widen_load-2.ll
76–79 ↗(On Diff #17942)

This was actually a rebasing mistake, and doesn't happen on a clean apply of the patch. Sorry for the noise.

chandlerc added inline comments.Jan 12 2015, 9:24 AM
include/llvm/Target/TargetLowering.h
1512

We're so inconsistent here that I don't really care about what the old code happened to name things provided we're consistent going forward and converge on that. Given that, I'd prefer the minimal amount of abbreviation that is sane, and I think spelling out "load" but abbreviating "ext" is a fine compromise there.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5660

Please don't add duplicated code and then factor it afterward.

Either do the minimal amount of factoring in this change so that we don't have to review 2x the code, or do the factoring first, and then the functional change.

ab added a comment.Jan 20 2015, 1:44 PM

Ping!

-Ahmed

hfinkel added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5227

Please make this comment more verbose. We might generate multiple loads and then combine them together, and it is not clear here whether this combining happens to the results of the sext or the sext's operand.

5274

Is this alignment correct? Aren't these loads smaller than the original one?

My only comments beyond what Hal has already mentioned are real nits:

s/dl/DL/

Can you use DL everywhere? It's odd that in one place you use SDLoc(N0). It may be correct, but if so, it's likely not obvious to the reader why we need to do something different here.

Finally, I'd find the indentation more consistent with clang-format. But now we're waaay down into minor details. This is generally looking much better.

ab updated this revision to Diff 19057.Jan 30 2015, 2:00 PM
  • Use smaller alignment for the split loads.
  • Improve formatting, names.
  • Don't accept ANYEXTs, since it's not testable or supported yet.
hfinkel accepted this revision.Feb 2 2015, 7:24 PM
hfinkel added a reviewer: hfinkel.

LGTM.

This revision is now accepted and ready to land.Feb 2 2015, 7:24 PM
This revision was automatically updated to reflect the committed changes.