This is an archive of the discontinued LLVM Phabricator instance.

[DAG] SimplifyDemandedVectorElts Bug fix for rG7cb5a51f386d
AbandonedPublic

Authored by yubing on May 15 2020, 2:09 AM.

Details

Summary
In X86TargetLowering::SimplifyDemandedVectorEltsForTargetNode,
before we try to do SimplifyDemandedVectorElts for OpInputs[Src],
if OpInputs[Src] which has users excluding Op.getNode(),
we assume that all elements are needed, i.e, set SrcElts.setAllBits()

For example:
t1317: v8i32 = insert_subvector undef:v8i32, t1414, Constant:i64<0>
t1315: v8i32 = X86ISD::BLENDI t380, t1317, TargetConstant:i8<2>
t1414: v4i32 = insert_vector_elt t679, t677, Constant:i64<2>
t1416: v8i32 = X86ISD::VBROADCAST t1414

When getTargetShuffleInputs(...) processed t1416, it created
NewNode:
  v8i32 = insert_subvector undef:v8i32, t1414, Constant:i64<0>
which is the same with t1317.

So getTargetShuffleInputs(...) set
 OpInputs[0] = t1317 which is used by t1315

Before SimplifyDemandedVectorElts processes OpInputs[0] which is used by
t1315, we assume that all elements are needed, i.e. SrcElts.setAllBits()

Diff Detail

Event Timeline

yubing created this revision.May 15 2020, 2:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2020, 2:09 AM

Really we need to stop creating nodes inside getFauxShuffle - I'm going to see if we can do this without too many regressions.

llvm/test/CodeGen/X86/simplifydemandedvectorselts-broadcast.ll
21

these bitcasts can probably be removed ? or at very least the addrspace ?

yubing updated this revision to Diff 264269.May 15 2020, 9:56 AM
yubing marked an inline comment as done.May 15 2020, 10:06 AM

Really we need to stop creating nodes inside getFauxShuffle - I'm going to see if we can do this without too many regressions.

Hi, Simon. I am wondering what benefits we can get from getFauxShuffle. Do we have a good example for it?
As I aware of, SimplifyDemandedVectorElts(...) is used to simplify a real SDNode using the information of DemandedVectorElts but It seems weird to simplify a Fauxshuffle.

I'm still looking at fixing getFauxShuffleMask (PR45974) but that might take a while, so this sort of approach is probably necessary.

Did you investigate replacing getTargetShuffleInputs with getTargetShuffleAndZeroables in the SimplifyDemandedBitsForTargetNode/SimplifyDemandedVectorEltsForTargetNode?

llvm/test/CodeGen/X86/simplifydemandedvectorselts-broadcast.ll
2

-use -mattr=+avx2 instead of -mcpu=core-avx2

yubing updated this revision to Diff 265967.May 24 2020, 9:54 PM
yubing marked an inline comment as done.EditedMay 25 2020, 12:11 AM

I'm still looking at fixing getFauxShuffleMask (PR45974) but that might take a while, so this sort of approach is probably necessary.

Did you investigate replacing getTargetShuffleInputs with getTargetShuffleAndZeroables in the SimplifyDemandedBitsForTargetNode/SimplifyDemandedVectorEltsForTargetNode?

Hi, Simon. It seems I figure out why we do getFauxShuffleMask(...) for t1416: v8i32 = X86ISD::VBROADCAST t1414.
Is that because we are able to recursively simpify t1414(the real SRC of t1416) if we are simplifying the fauxshuffle created by getFauxShuffleMask(...)?

Investigating further, I think adding a SimplifyMultipleUseDemandedBits call in the X86ISD::VBROADCAST case at the beginning of SimplifyDemandedVectorEltsForTargetNode would be a better solution going forward, and will help with the codegen in your test case. See the X86ISD::PACKS case for an example.

@yubing I think my fixes for PR45974 have addressed this now - please can you confirm?

yubing added a comment.Jun 1 2020, 1:25 AM

@yubing I think my fixes for PR45974 have addressed this now - please can you confirm?

Yeah, I've seen that bugfix and it works for my testcase. Thanks, Simon~
Besides, I am wondering in why we call SimplifyMultipleUseDemandedBits behind SimplifyDemandedVectorElts for X86ISD::VBROADCAST in SimplifyDemandedVectorEltsForTargetNode. Is that also correct calling SimplifyMultipleUseDemandedBits before SimplifyDemandedVectorElts ?

yubing abandoned this revision.Jun 1 2020, 1:29 AM

@yubing I think my fixes for PR45974 have addressed this now - please can you confirm?

Yeah, I've seen that bugfix and it works for my testcase. Thanks, Simon~
Besides, I am wondering in why we call SimplifyMultipleUseDemandedBits behind SimplifyDemandedVectorElts for X86ISD::VBROADCAST in SimplifyDemandedVectorEltsForTargetNode. Is that also correct calling SimplifyMultipleUseDemandedBits before SimplifyDemandedVectorElts ?

SimplifyMultipleUseDemandedBits is very limited as it only deals with bypassing the immediate SDValue (despite its name it handles bits + elts demand masks). SimplifyDemandedBitsForTargetNode/SimplifyDemandedVectorElts are much more capable as they can keep recursing up the DAG looking for a simplification until they reach the maximum depth limit, so it much more likely it will find a simplification and SimplifyMultipleUseDemandedBits tends to be only for cleanup (as it can ignore most OneUse limitations).