Moved from the ML:
As the last change in my extload series, here are 3 squashed (WIP) patches
to actually form extloads on vector types.
They used to be disabled, because "None of the supported targets knows
how to perform load and sign extend on vectors in one instruction."
The first part enables the combine on legal vectors, but hides it
behind a profitability callback.
For instance, on ARM, several instructions have folded extload forms,
so it's not always beneficial to create an extload node (and trying to
match extloads is a whole 'nother can of worms).
The second part adds a combine to fold s/zextloads of illegal
(splittable) vector types, to replace it directly by multiple smaller
extloads. I'm not a big fan of this kind of pseudo-legalization in
combines. However, I tried the alternative: form illegal extloads, and
later try to split them up, but then, you sometimes generate extloads
that can't be split up, but have a valid ext+load expansion. At
vector-op legalization time, it's too late to generate this kind of
thing, so you end up forced to scalarize. It's better to just avoid
creating egregiously illegal nodes.
Finally, the last part enables this all, unconditionally, on X86.
Note that the splitting combine is happy with "custom" extloads. As
is, this bypasses the actual custom lowering, and just unrolls the
extload. But from what I've seen, this is still much better than the
current custom lowering, which does some kind of unrolling at the end
anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added
Also note that there's a regression in widen_load-2.ll, where
we can no longer fold the load. I'll look into that later.
Anyway: as can be seen from the nice testcase cleanups, there's
something to be done here. The combines feel a bit dirty, but I don't
see a better alternative. Finally, I didn't see changes on the
testsuite (SSE2 X86-64, I'll try SSE4.1 and AVX2 as well.)
P.S.: I squashed all three because I don't think it makes it harder to
review, just longer. The three changes are nicely isolated, and will
be committed separately. I'll split into 3 threads if desired.
Depends on D6533.