This is an archive of the discontinued LLVM Phabricator instance.

[X86] Enable -x86-experimental-vector-widening-legalization by default.
ClosedPublic

Authored by craig.topper on Dec 3 2018, 10:33 PM.

Details

Summary

This patch changes our defualt legalization behavior for narrow vectors with i8/i16/i32/i64 scalar types from promotion to widening. This keeps the elements widths the same and pads with undef elements. I believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors.

I'm sure there are still some issues in here, but I wanted to get this patch up so we could start spotting the remaining issues.

Diff Detail

Event Timeline

craig.topper created this revision.Dec 3 2018, 10:33 PM

Most of the test case changes make sense to me.

Places where we have lots more instructions are because we now need to zero-extend when using weird vector types in test cases (<4 x i8>) that have no realistic model in X86. Not worrisome at all.

Some of the cost model increases are surprising to me, flagged them below.

Any benchmark data? We can try to get some with this flag flipped. @asbirlea might be able to get some good data for you with Halide which has a tendancy to stress test these kinds of legalization issues because they generate large vectors and rely on the legalization to shard them and lay them out into pipelinable vector ops.

test/Analysis/CostModel/X86/fptoui.ll
296–297 ↗(On Diff #176549)

This seems... a bit surprising.

test/Analysis/CostModel/X86/reduce-add.ll
86 ↗(On Diff #176549)

This also seems a bit surprising.

For anyone watching this, I've asked Craig for some more time to thoroughly test this patch internally - with Christmas etc. its unlikely we'll get this done properly with much time for it to settle before the 8.00 branch so probably aim for post-8.00 branch.

@RKSimon, @gbedwell did you get any performance testing done on this?

@RKSimon, @gbedwell did you get any performance testing done on this?

@gbedwell has more details, but we've done conformance tests and didn't find anything breaking. Only very basic performance checks have been done but I don't think it found anything.

Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2019, 9:38 PM
Herald added a subscriber: hiraditya. · View Herald Transcript

-Rebase
-Add a hack to the reduction cost model code that keeps v2i32, v4i16, v2i16 from having the same cost as v4i32 and v8i16 due to the type legalization cost.

Disable the 64-bit load+store optimization for vectors types. Type legalization will deal with this.

@craig.topper please can you rebase - there's a diff in load-partial.ll

RKSimon accepted this revision.Jul 24 2019, 9:35 AM

Thanks Craig, I've no more objections to this going in now. Its up to you if you want to commit this straight away or wait a little longer to minimise any cherry picking problems for fixes into the 9.000 release branch.

This revision is now accepted and ready to land.Jul 24 2019, 9:35 AM
xbolva00 added inline comments.
llvm/test/CodeGen/X86/widen_cast-3.ll
14–15

Not ideal?

craig.topper marked an inline comment as done.Jul 24 2019, 12:15 PM
craig.topper added inline comments.
llvm/test/CodeGen/X86/widen_cast-3.ll
14–15

Looks like something pessimistic is happening in type legalization for store widening. This test case used to generate the same code it is with this patch, but was changed in March in r357120. I'll see if we can enhance r357120 to handle this.