Use v8i64 ASHR instructions if we don't have VLX.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Thanks for working on this!
Looks good, except a sitofp_2i1_float as you mentioned before.
I think that the cost model changes should be in a separate commit.
lib/Target/X86/X86TargetTransformInfo.cpp | ||
---|---|---|
210 ↗ | (On Diff #83948) | Only this? What about v8i64, v2i64? and shift left? |
test/CodeGen/X86/avx512-cvt.ll | ||
1073–1074 | I think that this issue should be resolved, or at least understood prior to commit. |
test/CodeGen/X86/avx512-cvt.ll | ||
---|---|---|
1073–1074 | FYI, this test case does something really terrible when avx512vl is enabled without avx512dq. Its get completely scalarized due to lack of vpmovm2d. |
test/CodeGen/X86/avx512-cvt.ll | ||
---|---|---|
1073–1074 | So what's happening here is that the setcc from the cmp gets its result type converted from v2i1 to v2i32 then to v2i64. Then the operands get legalized from v2i32 to v4i32 and we end up with (v2i64 sign_extend (v2i32 extract_subvector (v4i32 setcc)). The sign_extend becomes a sign_extend_inreg that is now implemented with a v2i64 vshli(32) and vsrai(32). Previously because v2i64 vsrai wasn't legal we lowered it to a v4i32 vsrai(31) and a shuffle. There was another shuffle later that wanted elements from this shuffle that didn't come from the VSRAI so it got removed. And I think the vshli(32) was able to get combined with other shuffles to produce the INSERTPS. So the main issue here is that the setcc legalizing for this produces a v2i64 type that we don't need and aren't able to recover from very well. |
test/CodeGen/X86/avx512-cvt.ll | ||
---|---|---|
1073–1074 | Thanks for the analysis, Craig - its seems to be a general problem and we're just lucky that with current codegen it only costs us an insertps and nothing more extravagant, I'm sure there are plenty of other examples with other non-legal types that we just haven't looked at. This might turn out to be a massive yak shaving issue so in the meantime I've created D28604 so that we can at least support variable shifts . |
Updated patch now that D28604 has landed - still working on the legalization issue that needs to be fixed before this patch.
I think that this issue should be resolved, or at least understood prior to commit.