This is an archive of the discontinued LLVM Phabricator instance.

[X86][AVX512] Add support for ASHR v2i64/v4i64 support without VLX
ClosedPublic

Authored by RKSimon on Jan 10 2017, 3:01 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 83880.Jan 10 2017, 3:01 PM
RKSimon retitled this revision from to [X86][AVX512] Add support for ASHR v2i64/v4i64 support without VLX.
RKSimon updated this object.
RKSimon added reviewers: delena, mkuper, igorb, craig.topper.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
igorb edited edge metadata.Jan 11 2017, 1:22 AM

Thanks for working on this!
Looks good, except a sitofp_2i1_float as you mentioned before.

RKSimon updated this revision to Diff 83948.Jan 11 2017, 3:55 AM
RKSimon edited edge metadata.

Updated cost model

delena edited edge metadata.Jan 11 2017, 4:10 AM

I think that the cost model changes should be in a separate commit.

lib/Target/X86/X86TargetTransformInfo.cpp
210 ↗(On Diff #83948)

Only this? What about v8i64, v2i64? and shift left?

test/CodeGen/X86/avx512-cvt.ll
943 ↗(On Diff #83948)

I think that this issue should be resolved, or at least understood prior to commit.

craig.topper added inline comments.Jan 11 2017, 12:30 PM
test/CodeGen/X86/avx512-cvt.ll
943 ↗(On Diff #83948)

FYI, this test case does something really terrible when avx512vl is enabled without avx512dq. Its get completely scalarized due to lack of vpmovm2d.

craig.topper added inline comments.Jan 11 2017, 6:05 PM
test/CodeGen/X86/avx512-cvt.ll
943 ↗(On Diff #83948)

So what's happening here is that the setcc from the cmp gets its result type converted from v2i1 to v2i32 then to v2i64. Then the operands get legalized from v2i32 to v4i32 and we end up with (v2i64 sign_extend (v2i32 extract_subvector (v4i32 setcc)). The sign_extend becomes a sign_extend_inreg that is now implemented with a v2i64 vshli(32) and vsrai(32). Previously because v2i64 vsrai wasn't legal we lowered it to a v4i32 vsrai(31) and a shuffle. There was another shuffle later that wanted elements from this shuffle that didn't come from the VSRAI so it got removed. And I think the vshli(32) was able to get combined with other shuffles to produce the INSERTPS.

So the main issue here is that the setcc legalizing for this produces a v2i64 type that we don't need and aren't able to recover from very well.

RKSimon added inline comments.Jan 12 2017, 7:29 AM
test/CodeGen/X86/avx512-cvt.ll
943 ↗(On Diff #83948)

Thanks for the analysis, Craig - its seems to be a general problem and we're just lucky that with current codegen it only costs us an insertps and nothing more extravagant, I'm sure there are plenty of other examples with other non-legal types that we just haven't looked at.

This might turn out to be a massive yak shaving issue so in the meantime I've created D28604 so that we can at least support variable shifts .

RKSimon updated this revision to Diff 84292.Jan 13 2017, 6:13 AM
RKSimon edited edge metadata.

Updated patch now that D28604 has landed - still working on the legalization issue that needs to be fixed before this patch.

RKSimon updated this revision to Diff 88890.Feb 17 2017, 7:31 AM

Rebased now that D29454 has landed

RKSimon edited the summary of this revision. (Show Details)Feb 19 2017, 8:40 AM

OK to commit now that D29454 has fixed the legalization issue?

This revision is now accepted and ready to land.Feb 19 2017, 7:54 PM
This revision was automatically updated to reflect the committed changes.