This is an archive of the discontinued LLVM Phabricator instance.

[X86] `DAGTypeLegalizer::ModifyToType()`: when widening w/ zeros, insert into undef and `and`-mask the padding away
ClosedPublic

Authored by lebedev.ri on Oct 16 2022, 12:22 PM.

Details

Summary

We can expect that the sequence of inserting-of-extracts-into-undef
will be successfully lowered back into widening of the source vector,
but it seems that at least for X86 mask vectors, we have a really hard time
recovering from inserting-into-zero.

I've looked into alternative fix injection points, and they are much more involved,
by the time of LowerBUILD_VECTORvXi1()/LowerINSERT_VECTOR_ELT() the constants might be obscured,
so it does not seem like we can easily deal with this by lowering into bit math later on,
some other pieces are missing.

Instead, it seems like just clearing the padding away via an AND-mask
is at least not a worse choice. Why create a problem where there wasn't one.
Though yes, it is possible that there are cases where constants originate
from the source IR, so some other fix may still be needed.

Diff Detail

Event Timeline

lebedev.ri created this revision.Oct 16 2022, 12:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 16 2022, 12:22 PM
lebedev.ri requested review of this revision.Oct 16 2022, 12:22 PM
RKSimon added inline comments.Oct 16 2022, 12:51 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
7065

Why not use append?

SmallVector<SDValue, 16> MaskOps;
MaskOps.append(MinNumElts, DAG.getAllOnesConstant(dl, EltVT));
MaskOps.append(WidenNumElts - MinNumElts, DAG.getConstant(0, dl, EltVT));
7068

why the braces?

lebedev.ri marked 2 inline comments as done.

Thanks for taking a look!
Address review comments.

pengfei added inline comments.Oct 16 2022, 6:41 PM
llvm/test/CodeGen/X86/masked_store.ll
6214–6224

The transform looks correct to me.
Just a tangential question: I didn't see any alignment info assigned to the pointers, why are we using vmovdqa64 for %trigger.ptr but vmovdqu32 for dst?
I have this question because we cannot load the garbage data into zmm1 if it is not aligned, since it may cause memory fault.

lebedev.ri marked an inline comment as done.Oct 17 2022, 6:55 AM
lebedev.ri added inline comments.
llvm/test/CodeGen/X86/masked_store.ll
6214–6224

Because plain loads ended up being align 128 (by default),
while i have explicitly specified align of i32 immarg 1 for @llvm.masked.store.

pengfei added inline comments.Oct 17 2022, 7:27 AM
llvm/test/CodeGen/X86/masked_store.ll
6214–6224

Thanks @lebedev.ri for the explanations. I can see the @llvm.masked.store now, but still not sure why align 128 generates vmovdqa64. Should it require align 512?

pengfei added inline comments.Oct 17 2022, 7:28 AM
llvm/test/CodeGen/X86/masked_store.ll
6214–6224

Sorry, I see your point. It's 128 bytes not bits :)

lebedev.ri marked an inline comment as done.Oct 18 2022, 5:42 PM

So is everyone happy with the particular fix?
Does anyone want to stamp? :)

pengfei accepted this revision.Oct 18 2022, 6:06 PM

LGTM.

This revision is now accepted and ready to land.Oct 18 2022, 6:06 PM