This is an archive of the discontinued LLVM Phabricator instance.

[X86] Type legalize v2i32 div/rem by scalarizing rather than promoting
ClosedPublic

Authored by craig.topper on Aug 27 2018, 2:31 PM.

Details

Summary

Previously we type legalized v2i32 div/rem by promoting to v2i64. But we don't support div/rem of vectors so op legalization would then scalarize it using i64 scalar ops since it doesn't know about the original promotion. 64-bit scalar divides on Intel hardware are known to be slow and in 32-bit mode they require a libcall.

This patch switches type legalization to do the scalarizing itself using i32.

It looks like the division by power of 2 optimization is still kicking in and leaving the code as a vector. The division by other constant optimization doesn't kick in pre type legalization since it ignores illegal types. And previously, after type legalization we scalarized the v2i64 since we don't have v2i64 MULHS/MULHU support.

Another option might be to widen v2i32 to v4i32 so we could do division by constant optimizations, but we'd have to be careful to only do that for constant divisors or we risk scalaring to 4 scalar divides.

Diff Detail

Event Timeline

craig.topper created this revision.Aug 27 2018, 2:31 PM

Doh! I got my check-prefixes reversed. Let me fix that.

Fix 32-bit and 64-bit check line swap

RKSimon added inline comments.Aug 28 2018, 3:54 AM
test/CodeGen/X86/vector-idiv-v2i32.ll
573

Noticed on D50636 - shouldn't this be loading from %y ? Same for other tests below.

Rebase after fixing tets

Disable the new custom scalarizing if we are legalizing v2i32 via widening instead or promoting. The generic type legalizer knows to scalarize v2i32 div/rem in that case since it can't widen a trapping operation.

This reminds me of D30296 - although I haven't checked how much crossover there really is

This reminds me of D30296 - although I haven't checked how much crossover there really is

D50791 also?

This is kind similar, but here the issue isn't the number of scalar operations, it's the size of the scalar operations. This probably also applies to v2i8 and v2i16 and any other v2iX type where X is 32 or less.

spatel accepted this revision.Sep 12 2018, 10:51 AM

LGTM

This revision is now accepted and ready to land.Sep 12 2018, 10:51 AM
This revision was automatically updated to reflect the committed changes.