Page MenuHomePhabricator

[CodeGen] Generate efficient assembly for freeze(poison) version of `mm*_cast*` intel intrinsics

Authored by aqjune on Jul 22 2022, 2:48 AM.



This patch makes the variants of mm*_cast* intel intrinsics that use shufflevector(freeze(poison), ..) emit efficient assembly.
(These intrinsics are planned to use shufflevector(freeze(poison), ..) after shufflevector's semantics update; relevant thread: D103874)

To do so, this patch

  1. Updates LowerAVXCONCAT_VECTORS in X86ISelLowering.cpp to recognize FREEZE(UNDEF) operand of CONCAT_VECTOR in addition to UNDEF
  2. Updates to recognize insert_subvector of FREEZE(UNDEF) vector as its first operand.

Diff Detail

Event Timeline

aqjune created this revision.Jul 22 2022, 2:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2022, 2:48 AM
aqjune requested review of this revision.Jul 22 2022, 2:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2022, 2:48 AM
aqjune added inline comments.Jul 22 2022, 2:50 AM

I found that using insert_subvector (freeze undef) instead of insert_subvector (freeze (VT undef)) causes this rule to be silently ignored during instruction selection.
What would the possible reasons be? I tried to figure it out, but couldn't.

Thank you for looking at this - its been on my todo pile for far too long!

aqjune added inline comments.Jul 29 2022, 2:02 AM

FWIW, insert_subvector (VT (freeze undef)) .. also results in generating vblendps $15, %ymm0, %ymm0, %ymm0. It seems (freeze undef) isn't being recognized. :/

Anyway, freeze (VT undef) works well. :)

RKSimon added inline comments.Jul 29 2022, 8:16 AM

Could we add a generic pattern for "undef_or_freeze_undef" that we could use here as I imagine a lot more places are going to need something like this and we should try to avoid duplication if we can?

craig.topper added inline comments.Jul 29 2022, 10:30 AM

Don't use SDTUnaryOp here. Use SDTypeProfile<1, 1, [SDTCisSameAs<0, 1>]>

SDTUnaryOp doesn't force the input and output to have the same type. I think that prevented VT from propagating through the freeze to the undef and that left the undef untyped causing the pattern to be dropped.

craig.topper added inline comments.Jul 29 2022, 10:33 AM

With the change I mentioned in you shouldn't need to mention VT near freeze or undef. The one at the start of the pattern will be enough.

RKSimon added inline comments.Aug 2 2022, 5:23 AM

Is there any reason we couldn't just always use DAG.getFreeze(DAG.getUNDEF(ResVT)) ?

aqjune updated this revision to Diff 449682.Aug 3 2022, 9:16 AM

Define undef_or_freeze_undef and use it for the new pattern

aqjune marked 3 inline comments as done.Aug 3 2022, 9:20 AM
aqjune added inline comments.

I tried using DAG.getFreeze(DAG.getUNDEF(ResVT)), and it needs updates in existing lowering functions to make the following tests pass:

  LLVM :: CodeGen/X86/haddsub-undef.ll
  LLVM :: CodeGen/X86/oddsubvector.ll
  LLVM :: CodeGen/X86/subvector-broadcast.ll
  LLVM :: CodeGen/X86/vector-interleaved-load-i16-stride-3.ll

It causes insertion of vinsert* instruction instead of efficient ops.
I think it is good to keep DAG.getUNDEF(ResVT) to avoid regression.

RKSimon added inline comments.Aug 8 2022, 8:45 AM

Thanks - I'll take a look at those cases after this patch has gone in - we have some custom vector widening patterns that we'll need to adjust to handle freeze(undef)

LGTM - @craig.topper any more comments?

This revision is now accepted and ready to land.Aug 9 2022, 9:07 AM