Page MenuHomePhabricator

[X86] Enable -x86-experimental-vector-widening-legalization by default.
Needs ReviewPublic

Authored by craig.topper on Dec 3 2018, 10:33 PM.



This patch changes our defualt legalization behavior for narrow vectors with i8/i16/i32/i64 scalar types from promotion to widening. This keeps the elements widths the same and pads with undef elements. I believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors.

I'm sure there are still some issues in here, but I wanted to get this patch up so we could start spotting the remaining issues.

Diff Detail

Event Timeline

craig.topper created this revision.Dec 3 2018, 10:33 PM

Most of the test case changes make sense to me.

Places where we have lots more instructions are because we now need to zero-extend when using weird vector types in test cases (<4 x i8>) that have no realistic model in X86. Not worrisome at all.

Some of the cost model increases are surprising to me, flagged them below.

Any benchmark data? We can try to get some with this flag flipped. @asbirlea might be able to get some good data for you with Halide which has a tendancy to stress test these kinds of legalization issues because they generate large vectors and rely on the legalization to shard them and lay them out into pipelinable vector ops.


This seems... a bit surprising.


This also seems a bit surprising.

For anyone watching this, I've asked Craig for some more time to thoroughly test this patch internally - with Christmas etc. its unlikely we'll get this done properly with much time for it to settle before the 8.00 branch so probably aim for post-8.00 branch.

@RKSimon, @gbedwell did you get any performance testing done on this?

@RKSimon, @gbedwell did you get any performance testing done on this?

@gbedwell has more details, but we've done conformance tests and didn't find anything breaking. Only very basic performance checks have been done but I don't think it found anything.