For the supported binops (basic arithmetic, logicals + shifts), if we fail to simplify the demanded vector elts, then call SimplifyMultipleUseDemandedBits and try to peek through ops to remove unnecessary dependencies.
This helps with PR40502.
Paths
| Differential D79003
[DAG] Add SimplifyDemandedVectorElts binop SimplifyMultipleUseDemandedBits handling ClosedPublic Authored by RKSimon on Apr 28 2020, 6:41 AM.
Details Summary For the supported binops (basic arithmetic, logicals + shifts), if we fail to simplify the demanded vector elts, then call SimplifyMultipleUseDemandedBits and try to peek through ops to remove unnecessary dependencies. This helps with PR40502.
Diff Detail
Event Timeline
Comment Actions Seems reasonable to me in general.
RKSimon added inline comments.
Comment Actions The code duplication is making me itchy. Add a helper like: if (simplifyDemandedVectorEltsBinop(Op, DemandedElts, TLO, Depth)) return true; That could include just the new block that's being created here, or we can dispatch directly on all of the binop cases in the top-level and then switch for the KnownZero/Undef/DemandedBits differences within there.
Comment Actions Cheers @spatel - I've pulled out the SimplifyDemandedVectorEltsBinOp helper which makes separate DemandedBits 'all bits' maskes for each op which should also answer the query about the fpops. Comment Actions LGTM. This revision is now accepted and ready to land.May 24 2020, 9:50 AM Comment Actions
I'll move the CombineTo handling into the lambda as well - the isAllOnes() I'm going to leave out until I can work out how to best merge it with KnownUndef as mentioned in the TODOs. Closed by commit rG9fa58d1bf2f8: [DAG] Add SimplifyDemandedVectorElts binop SimplifyMultipleUseDemandedBits… (authored by RKSimon). · Explain WhyMay 25 2020, 4:46 AM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 265932 llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
llvm/test/CodeGen/AArch64/mul_by_elt.ll
llvm/test/CodeGen/X86/combine-pmuldq.ll
llvm/test/CodeGen/X86/combine-sdiv.ll
llvm/test/CodeGen/X86/oddsubvector.ll
llvm/test/CodeGen/X86/vector-fshl-rot-128.ll
llvm/test/CodeGen/X86/vector-fshl-rot-256.ll
llvm/test/CodeGen/X86/vector-fshr-rot-128.ll
llvm/test/CodeGen/X86/vector-fshr-rot-256.ll
llvm/test/CodeGen/X86/vector-narrow-binop.ll
|
Is it correct to recycle the demanded bits on both operands here? That seems wrong for FP ops IIUC.