This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86ISelLowering.cpp
-
X86TargetTransformInfo.cpp
-
test/
-
Analysis/CostModel/X86/
-
CostModel/
-
X86/
-
alternate-shuffle-cost.ll
-
arith.ll
-
cast.ll
-
fptosi.ll
-
fptoui.ll
-
masked-intrinsic-cost.ll
-
reduce-add-widen.ll
-
reduce-add.ll
-
reduce-and.ll
-
reduce-mul.ll
-
reduce-or.ll
-
reduce-smax.ll
-
reduce-smin.ll
-
reduce-umax.ll
-
reduce-umin.ll
-
reduce-xor.ll
-
shuffle-transpose.ll
-
sitofp.ll
-
slm-arith-costs.ll
-
testshiftashr.ll
-
testshiftlshr.ll
-
testshiftshl.ll
-
uitofp.ll
-
CodeGen/X86/
-
X86/
-
2008-09-05-sinttofp-2xi32.ll
-
2009-06-05-VZextByteShort.ll
-
2011-10-19-LegelizeLoad.ll
-
2011-12-28-vselecti8.ll
-
2011-12-8-bitcastintprom.ll
-
2012-01-18-vbitcast.ll
-
2012-03-15-build_vector_wl.ll
-
2012-07-10-extload64.ll
-
3dnow-intrinsics.ll
-
4char-promote.ll
-
and-load-fold.ll
-
atomic-unordered.ll
-
avg.ll
-
avx-cvt-2.ll
-
avx-fp2int.ll
-
avx2-conversions.ll
-
avx2-masked-gather.ll
-
avx2-vbroadcast.ll
-
avx512-any_extend_load.ll
-
avx512-cvt.ll
-
avx512-ext.ll
-
avx512-intrinsics-upgrade.ll
-
avx512-mask-op.ll
-
avx512-trunc.ll
-
avx512-vec-cmp.ll
-
avx512-vec3-crash.ll
-
avx512bwvl-intrinsics-upgrade.ll
-
avx512vl-intrinsics-fast-isel.ll
-
avx512vl-intrinsics-upgrade.ll
-
bitcast-and-setcc-128.ll
-
bitcast-setcc-128.ll
-
bitcast-vector-bool.ll
-
bitreverse.ll
-
bswap-vector.ll
-
buildvec-insertvec.ll
-
combine-64bit-vec-binop.ll
-
combine-or.ll
-
complex-fastmath.ll
-
cvtv2f32.ll
-
extract-concat.ll
-
extract-insert.ll
-
f16c-intrinsics.ll
-
fold-vector-sext-zext.ll
-
insertelement-shuffle.ll
-
known-bits-vector.ll
-
known-bits.ll
-
load-partial.ll
-
lower-bitcast.ll
-
madd.ll
-
masked_compressstore.ll
-
masked_expandload.ll
-
masked_gather_scatter.ll
-
masked_gather_scatter_widen.ll
-
masked_load.ll
-
masked_store.ll
-
masked_store_trunc.ll
-
masked_store_trunc_ssat.ll
-
masked_store_trunc_usat.ll
-
merge-consecutive-loads-256.ll
-
mmx-arg-passing-x86-64.ll
-
mmx-arith.ll
-
mmx-cvt.ll
-
mulvi32.ll
-
oddshuffles.ll
-
oddsubvector.ll
-
pmaddubsw.ll
-
pmulh.ll
-
pointer-vector.ll
-
pr14161.ll
-
pr35918.ll
-
pr40994.ll
-
promote-vec3.ll
-
promote.ll
-
psubus.ll
-
ret-mmx.ll
-
sad.ll
-
sadd_sat_vec.ll
-
scalar_widen_div.ll
-
select.ll
-
shift-combine.ll
-
shrink_vmul.ll
-
shuffle-strided-with-offset-128.ll
-
shuffle-strided-with-offset-256.ll
-
shuffle-strided-with-offset-512.ll
-
shuffle-vs-trunc-128.ll
-
shuffle-vs-trunc-256.ll
-
shuffle-vs-trunc-512.ll
-
slow-pmulld.ll
-
sse2-intrinsics-canonical.ll
-
sse2-vector-shifts.ll
-
ssub_sat_vec.ll
-
test-shrink-bug.ll
-
trunc-ext-ld-st.ll
-
trunc-subvector.ll
-
uadd_sat_vec.ll
-
unfold-masked-merge-vector-variablemask.ll
-
usub_sat_vec.ll
-
vec_cast2.ll
-
vec_cast3.ll
-
vec_ctbits.ll
-
vec_extract-mmx.ll
-
vec_fp_to_int.ll
-
vec_insert-5.ll
-
vec_insert-7.ll
-
vec_insert-mmx.ll
-
vec_int_to_fp.ll
-
vec_saddo.ll
-
vec_smulo.ll
-
vec_ssubo.ll
-
vec_uaddo.ll
-
vec_umulo.ll
-
vec_usubo.ll
-
vector-blend.ll
-
vector-ext-logic.ll
-
vector-gep.ll
-
vector-half-conversions.ll
-
vector-idiv-udiv-128.ll
-
vector-idiv-udiv-256.ll
-
vector-idiv-v2i32.ll
-
vector-narrow-binop.ll
-
vector-reduce-add.ll
-
vector-reduce-and-bool.ll
-
vector-reduce-and.ll
-
vector-reduce-mul.ll
-
vector-reduce-or-bool.ll
-
vector-reduce-or.ll
-
vector-reduce-smax.ll
-
vector-reduce-smin.ll
-
vector-reduce-umax.ll
-
vector-reduce-umin.ll
-
vector-reduce-xor-bool.ll
-
vector-reduce-xor.ll
-
vector-sext.ll
-
vector-shift-ashr-sub128.ll
-
vector-shift-by-select-loop.ll
-
vector-shift-lshr-sub128.ll
-
vector-shift-shl-sub128.ll
-
vector-shuffle-128-v16.ll
-
vector-shuffle-combining.ll
-
vector-trunc-packus.ll
-
vector-trunc-ssat.ll
-
vector-trunc-usat.ll
-
vector-trunc.ll
-
vector-truncate-combine.ll
-
vector-zext.ll
-
vsel-cmp-load.ll
-
vselect-avx.ll
-
vselect.ll
-
vshift-4.ll
-
widen_arith-1.ll
-
widen_arith-2.ll
-
widen_arith-3.ll
-
widen_bitops-0.ll
-
widen_cast-1.ll
-
widen_cast-2.ll
1/2
widen_cast-3.ll
-
widen_cast-4.ll
-
widen_cast-5.ll
-
widen_cast-6.ll
-
widen_compare-1.ll
-
widen_conv-1.ll
-
widen_conv-2.ll
-
widen_conv-3.ll
-
widen_conv-4.ll
-
widen_load-2.ll
-
widen_shuffle-1.ll
-
x86-interleaved-access.ll
-
x86-shifts.ll
-
Transforms/SLPVectorizer/X86/
-
SLPVectorizer/
-
X86/
-
blending-shuffle.ll
-
fptosi.ll
-
fptoui.ll
-
insert-element-build-vector.ll
-
sitofp.ll
-
uitofp.ll

Differential D55251

[X86] Enable -x86-experimental-vector-widening-legalization by default.
ClosedPublic

Authored by craig.topper on Dec 3 2018, 10:33 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
chandlerc
gbedwell

Commits

rG3de33245d2c9: [X86] Enable -x86-experimental-vector-widening-legalization by default.

Summary

This patch changes our defualt legalization behavior for narrow vectors with i8/i16/i32/i64 scalar types from promotion to widening. This keeps the elements widths the same and pads with undef elements. I believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors.

I'm sure there are still some issues in here, but I wanted to get this patch up so we could start spotting the remaining issues.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 36071
Build 36070: arc lint + arc unit

Event Timeline

craig.topper created this revision.Dec 3 2018, 10:33 PM

Harbormaster completed remote builds in B25651: Diff 176549.Dec 3 2018, 10:33 PM

Most of the test case changes make sense to me.

Places where we have lots more instructions are because we now need to zero-extend when using weird vector types in test cases (<4 x i8>) that have no realistic model in X86. Not worrisome at all.

Some of the cost model increases are surprising to me, flagged them below.

Any benchmark data? We can try to get some with this flag flipped. @asbirlea might be able to get some good data for you with Halide which has a tendancy to stress test these kinds of legalization issues because they generate large vectors and rely on the legalization to shard them and lay them out into pipelinable vector ops.

test/Analysis/CostModel/X86/fptoui.ll
296–297 ↗	(On Diff #176549)	This seems... a bit surprising.
test/Analysis/CostModel/X86/reduce-add.ll
86 ↗	(On Diff #176549)	This also seems a bit surprising.

RKSimon added a reviewer: gbedwell.Dec 4 2018, 1:55 AM

Rebase

Harbormaster completed remote builds in B26021: Diff 178194.Dec 14 2018, 12:23 AM

For anyone watching this, I've asked Craig for some more time to thoroughly test this patch internally - with Christmas etc. its unlikely we'll get this done properly with much time for it to settle before the 8.00 branch so probably aim for post-8.00 branch.

anton-afanasyev added a subscriber: anton-afanasyev.Dec 21 2018, 2:23 AM

RKSimon mentioned this in D56082: [X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub).Dec 27 2018, 7:28 AM

Rebase

@RKSimon, @gbedwell did you get any performance testing done on this?

In D55251#1369586, @craig.topper wrote:

@RKSimon, @gbedwell did you get any performance testing done on this?

@gbedwell has more details, but we've done conformance tests and didn't find anything breaking. Only very basic performance checks have been done but I don't think it found anything.

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 28 2019, 10:57 AM

Rebase

Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2019, 9:38 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B30592: Diff 195295.Apr 15 2019, 9:43 PM

Rebase

Harbormaster completed remote builds in B33928: Diff 206607.Jun 26 2019, 1:06 AM

-Rebase
-Add a hack to the reduction cost model code that keeps v2i32, v4i16, v2i16 from having the same cost as v4i32 and v8i16 due to the type legalization cost.

Diffusion mentioned this in rL366405: [X86] Disable combineConcatVectors for vXi1 vectors..Jul 17 2019, 11:17 PM

craig.topper mentioned this in rG8da040221023: [X86] Disable combineConcatVectors for vXi1 vectors..Jul 17 2019, 11:22 PM

Rebase

Harbormaster completed remote builds in B35222: Diff 210488.Jul 17 2019, 11:31 PM

Disable the 64-bit load+store optimization for vectors types. Type legalization will deal with this.

Harbormaster completed remote builds in B35223: Diff 210489.Jul 17 2019, 11:39 PM

russell.gallop added a subscriber: russell.gallop.Jul 23 2019, 8:27 AM

@craig.topper please can you rebase - there's a diff in load-partial.ll

Rebase

Thanks Craig, I've no more objections to this going in now. Its up to you if you want to commit this straight away or wait a little longer to minimise any cherry picking problems for fixes into the 9.000 release branch.

This revision is now accepted and ready to land.Jul 24 2019, 9:35 AM

xbolva00 added a subscriber: xbolva00.Jul 24 2019, 9:41 AM

xbolva00 added inline comments.

llvm/test/CodeGen/X86/widen_cast-3.ll
14–15	Not ideal?

craig.topper marked an inline comment as done.Jul 24 2019, 12:15 PM

craig.topper added inline comments.

llvm/test/CodeGen/X86/widen_cast-3.ll
14–15	Looks like something pessimistic is happening in type legalization for store widening. This test case used to generate the same code it is with this patch, but was changed in March in r357120. I'll see if we can enhance r357120 to handle this.