This is an archive of the discontinued LLVM Phabricator instance.

[X86] Remove sse41 specific code from lowering v16i8 multiply
ClosedPublic

Authored by craig.topper on Mar 8 2018, 11:45 AM.

Details

Summary

With the SRAs removed from the SSE2 code as proposed in D44267, then there doesn't appear to be any advantage to the sse41 code. The punpcklbw instruction and pmovsx seem to have the same latency and throughput on most CPUs. And the SSE41 code requires moving the upper 64-bits into the lower 64-bit before the sign extend can be done. The unpckhbw in sse2 code can do better than that.

Diff Detail

Event Timeline

craig.topper created this revision.Mar 8 2018, 11:45 AM
RKSimon added inline comments.Mar 8 2018, 5:12 PM
test/CodeGen/X86/vector-mul.ll
968

Why wasn't this constant folded?

craig.topper added inline comments.Mar 8 2018, 5:44 PM
test/CodeGen/X86/vector-mul.ll
968

At one point that constant pool entry was used by two vector shuffles and I guess we refused to fold it due to multiple uses? Late one shuffle became UNPCKH and the other became zero_extend_vector_in_reg. The DAG combine for zero_extend_vector_in_reg was perfectly happy to overlook the multiple uses and constant fold it. This is the LCPI on the pmullw. This dropped the usage count on the original constant pool but it was too late to trigger the fold.

Should we stop the zero_extend_vector_in_reg from constant folding multiple uses?

RKSimon added inline comments.Mar 12 2018, 6:36 AM
test/CodeGen/X86/vector-mul.ll
968

Should we stop the zero_extend_vector_in_reg from constant folding multiple uses?

Probably but I'm wondering if it'd be better to wait until we've made progress simplifying the mixture of *_EXTEND/*_EXTEND_VECTOR_INREG/V*EXT that we currently use for SSE/AVX vector extensions?

RKSimon accepted this revision.Mar 18 2018, 7:09 AM

LGTM - please can you add a fixme/bug about the the constant folding someplace

This revision is now accepted and ready to land.Mar 18 2018, 7:09 AM
craig.topper closed this revision.Mar 20 2018, 11:08 AM

commited in 327869 but forgot to add the Differential Revision line