This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Improve tryStoreMergeOfExtracts to merge stores before type is legalized
Needs ReviewPublic

Authored by tingwang on Jun 4 2019, 10:40 PM.

Details

Summary

Implement the opportunity to merge stores from extracts before type is legalized on PPC.

Diff Detail

Event Timeline

lkail created this revision.Jun 4 2019, 10:40 PM
lkail edited the summary of this revision. (Show Details)Jun 4 2019, 10:44 PM
lkail retitled this revision from Merge consecutive stores of vector elements before types are legalized to [PowerPC] Merge consecutive stores of vector elements before types are legalized.Jun 4 2019, 10:49 PM
niravd added a comment.Jun 6 2019, 9:09 AM

This seems like it should be folded into the already existing checks. Do you know why NumStoresToMerge was not being before. I expect it's the requirement of a legal types in pre-legal merges or PPC's check for allowed misaligned accesses missing some cases. If it's the former I suspect we can disable the legality requirement prelegaltypes for non-truncated stores (replace TLI.isTypeLegal(ty) with isTypeLegal(ty)).

niravd added a subscriber: fhahn.Jun 7 2019, 11:34 AM
lkail added a comment.EditedJun 9 2019, 10:44 PM

This seems like it should be folded into the already existing checks. Do you know why NumStoresToMerge was not being before. I expect it's the requirement of a legal types in pre-legal merges or PPC's check for allowed misaligned accesses missing some cases. If it's the former I suspect we can disable the legality requirement prelegaltypes for non-truncated stores (replace TLI.isTypeLegal(ty) with isTypeLegal(ty)).

Thanks for responding, @niravd. I currently have no idea of why NumStoresToMerge was not being before. I might have a look at patches related to this portion of code. I notice that consecutive stores of vector elements was first introduced by https://reviews.llvm.org/rL224611 in which a legal type was a requirement already. I'll have a try of what you said.

lkail added a comment.Jun 17 2019, 1:27 AM

This seems like it should be folded into the already existing checks. Do you know why NumStoresToMerge was not being before. I expect it's the requirement of a legal types in pre-legal merges or PPC's check for allowed misaligned accesses missing some cases. If it's the former I suspect we can disable the legality requirement prelegaltypes for non-truncated stores (replace TLI.isTypeLegal(ty) with isTypeLegal(ty)).

Hi, @niravd, after investigate code carefully, I think this check might not be redundant. Considering the case, we have 3 i32 values extracted from vectors, both isTypeLegal and TIL.isTypeLegal see a v3i32 illegal. However, MergeStoresOfConstantsOrVecElts which is called later by MergeConsecutiveStores doesn't require type legality and builds a BUILD_VECTOR node whose elements are 3 EXTRACT_VECTOR_ELT values. PowerPC's vector type legalizer can handle such cases, so it can benefit from getNumStoresOfVectorElementsToMergePreLegalize. I know it's quite weird such check added within a context where type legality check is around. I once wanna implement it in PPCTargetLowering::PerformDAGCombine, however it might make code duplicated.

Hi, @niravd, after investigate code carefully, I think this check might not be redundant. Considering the case, we have 3 i32 values extracted from vectors, both isTypeLegal and TIL.isTypeLegal see a v3i32 illegal.

Are you certain isTypeLegal was returning false in prelegalization? isTypeLegal(x) should be "!LegalTypes || TLI.isTypeLegal(x)". I was expecting that if you swapped out TLI.isTypeLegal for isTypeLegal where it was failing, we could generate an invalid node, but it sounds like that's not the case. If so, I think we should double check.

However, MergeStoresOfConstantsOrVecElts which is called later by MergeConsecutiveStores doesn't require type legality and builds a BUILD_VECTOR node whose elements are 3 EXTRACT_VECTOR_ELT values. PowerPC's vector type legalizer can handle such cases, so it can benefit from getNumStoresOfVectorElementsToMergePreLegalize. I know it's quite weird such check added within a context where type legality check is around. I once wanna implement it in PPCTargetLowering::PerformDAGCombine, however it might make code duplicated.

lkail added a comment.Jun 18 2019, 7:17 PM

@niravd Thanks for pointing out my mistake. I tried swapping out TIL.isTypeLegal with isTypeLegal, code generated for PowerPC is not what I expect, cuz TIL.allowsMemoryAccess will return false if Ty is something like v3i32. Also this change will break some regression tests of SystemZ and X86. And I don't quite understand the meaning of 'double check' here, could you please explain more?

By double check, I just meant to look at the results again with isTypeLegal checked, which is where we are?

FWIW, It's probably fine to do something like isTypeLegal with allowsMemoryAccess, though there will likely be more changes in other backends,. I expect most to be mundane. It may be worth it to update and see if others are motivated to look into real regressions.

@niravd Thanks for pointing out my mistake. I tried swapping out TIL.isTypeLegal with isTypeLegal, code generated for PowerPC is not what I expect, cuz TIL.allowsMemoryAccess will return false if Ty is something like v3i32. Also this change will break some regression tests of SystemZ and X86. And I don't quite understand the meaning of 'double check' here, could you please explain more?

lkail updated this revision to Diff 205740.Jun 19 2019, 11:26 PM
lkail retitled this revision from [PowerPC] Merge consecutive stores of vector elements before types are legalized to [DAGCombiner] Merge consecutive stores of vector elements before types are legalized.
lkail edited the summary of this revision. (Show Details)
lkail added a comment.EditedJun 19 2019, 11:30 PM

Updated the patch. @niravd any further suggestions?

lkail edited reviewers, added: craig.topper, RKSimon, uweigand; removed: MaskRay.Jun 19 2019, 11:34 PM

Looks like it's mostly an improvement though there are some potential regressions around vector shuffles. I'll leave it to the others if it's acceptable to land.

Also, what happened to the PPC test changes?

The X86 changes LGTM

lkail added a comment.Jun 20 2019, 6:29 PM

Also, what happened to the PPC test changes?

Cuz PPC's allowsMemoryAccess lacks information about which CombineLevel is at, it fails the check(Only a few vector types are allowed to have misaligned access). As a result, currently no changes happen in PPC's code. I might try to solve this problem with another patch.

lkail added a comment.Jun 23 2019, 6:37 PM

Hi @jonpa, could you have a look if it is a real reg in SystemZ's change?

Cuz PPC's allowsMemoryAccess lacks information about which CombineLevel is at, it fails the check(Only a few vector types are allowed to have misaligned access). As a result, currently no changes happen in PPC's code. I might try to solve this problem with another patch.

Ah. Can you solve this by doing the analog to isTypeLegal (vs. isTypeLegal) and assume it's true prelegalization? If it's that small, you should (but don't feel like you must) fold it into this patch.

Hi @jonpa, could you have a look if it is a real reg in SystemZ's change?

At a first glance this seems to be generating worse code now since we need to do all those permute-type instructions in order to get the bytes into the correct order in a single register to store ... I'll clarify with the hardware folks which of the sequences would actually be preferable.

If it turns out that there are instances where *not* merging stores (even when it would be *possible*) is not preferred from a performance perspective, should common code use some cost function here?

lkail added a comment.EditedJun 25 2019, 1:26 AM

Considering suggestions of @niravd and @uweigand , is it proper to have an implementation like

bool DAGCombiner::isAbleToMergeConsecutiveStoresPreLegalize(ArrayRef<SNode*> elements, .../* Params related to align, addrspace and etc.*/) {
  if (LegalTypes)
    return false;
  // Let target decides cost considering elements to be stored.
  return TLI.canMergeConsecutiveStoresOfVectorElements(elements, ...);
}

And new check is

if (isTypeLegal(Ty) &&
    TLI.canMergeStoresTo(FirstStoreAS, Ty, DAG) &&
    ((TLI.allowsMemoryAccess(Context, DL, Ty,
                           *FirstInChain->getMemOperand(), &IsFast) &&
      IsFast) || isAbleToMergeConsecutiveStoresPreLegalize(elements, ...))

It's reasonable assuming we want to prohibit the SystemZ cases. Note that TLI.canMergeConsecutiveStoresOfVectorElements and TLI.canMergeStoresTo are both only called here and should be merged into a single method.

That said, the permute expression seems like a sign that there are some permutation peepholes may be worthwhile. We could simplify the permute into a BUILD_VECTOR of a vector stores which should be close enough to the original element-wise stores here.

Considering suggestions of @niravd and @uweigand , is it proper to have an implementation like

bool DAGCombiner::isAbleToMergeConsecutiveStoresPreLegalize(ArrayRef<SNode*> elements, .../* Params related to align, addrspace and etc.*/) {
  if (LegalTypes)
    return false;
  // Let target decides cost considering elements to be stored.
  return TLI.canMergeConsecutiveStoresOfVectorElements(elements, ...);
}

And new check is

if (isTypeLegal(Ty) &&
    TLI.canMergeStoresTo(FirstStoreAS, Ty, DAG) &&
    ((TLI.allowsMemoryAccess(Context, DL, Ty,
                           *FirstInChain->getMemOperand(), &IsFast) &&
      IsFast) || isAbleToMergeConsecutiveStoresPreLegalize(elements, ...))

@lkail Are you still looking at this please?

lkail added a comment.Aug 19 2019, 6:26 PM

Hi @RKSimon , currently I'm not working on it. I should have abandoned this patch. I might have another patch which also works for PowerPC.

lkail abandoned this revision.Aug 19 2019, 6:28 PM

Since this patch has long been not updated, I'll close it and plan another patch that also works for PowerPC.

tingwang commandeered this revision.Jul 31 2022, 11:24 PM
tingwang added a reviewer: lkail.
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2022, 11:24 PM
tingwang updated this revision to Diff 448921.Jul 31 2022, 11:45 PM
tingwang retitled this revision from [DAGCombiner] Merge consecutive stores of vector elements before types are legalized to [DAGCombiner] Improve tryStoreMergeOfExtracts by using double sized vector type before type legalized .
tingwang edited the summary of this revision. (Show Details)
tingwang edited reviewers, added: shchenz, qiucf, Esme, Restricted Project; removed: hfinkel, jsji, steven.zhang.
tingwang removed a project: Restricted Project.

With this change, the SystemZ case is not touched. Two x86 cases still need to confirm.

According to previous comments, it may be better to have some cost function to decide if merge is preferred or not. I'm not sure how to implement the cost function, one reason is that the final code sequence generated may depends on target specific flag etc., and backend target at this stage may not have enough information to decide. For example results in extract-and-store.ll show dependence on -ppc-disable-perfect-shuffle=false.

Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2022, 11:45 PM

Instead of a cost function, could we use (possibly tweaked) isMultiStoresCheaperThanBitsMerge?

llvm/test/CodeGen/PowerPC/extract-and-store.ll
533

is this really an issue with the store merging or the ppc shuffle combines have gotten messed up?

Instead of a cost function, could we use (possibly tweaked) isMultiStoresCheaperThanBitsMerge?

Thank you for pointing out. I will try to come up with something to guard against cases that got degenerated.

llvm/test/CodeGen/PowerPC/extract-and-store.ll
533

Had a quick check, this is the case PPC::LowerVECTOR_SHUFFLE does not have efficient solution, so it turned into VPERM as last resort.

Probably the cost function should avoid this kind of situation.

RKSimon added inline comments.Aug 1 2022, 5:12 AM
llvm/test/CodeGen/PowerPC/extract-and-store.ll
533

I'd be nervous about a cost function as that is likely to be very difficult to keep balanced. I'd probably recommend just overriding canMergeStoresTo or isMultiStoresCheaperThanBitsMerge for PPC

tingwang updated this revision to Diff 449219.Aug 2 2022, 1:37 AM

Added cost function for target as suggested. The function on PPC tries to avoid some patterns that lead to TOC accesses.

tingwang added inline comments.Aug 2 2022, 1:42 AM
llvm/test/CodeGen/PowerPC/extract-and-store.ll
533

Thank you for the advice. I created a new function, hope that is fine.

This code seems quite unnecessarily complex. I can achieve essentially the same results with something like this:

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 36973f5bddb0..984e84ba6fdc 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -939,6 +939,9 @@ public:
            (unsigned)VT.getSimpleVT().SimpleTy < array_lengthof(RegClassForVT));
     return VT.isSimple() && RegClassForVT[VT.getSimpleVT().SimpleTy] != nullptr;
   }
+  virtual bool isTypeLegalForMemAccess(EVT VT) const {
+    return isTypeLegal(VT);
+  }
 
   class ValueTypeActionImpl {
     /// ValueTypeActions - For each value type, keep a LegalizeTypeAction enum
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index 5e77317572af..6acde2a5ae91 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -18486,7 +18486,7 @@ bool DAGCombiner::tryStoreMergeOfExtracts(
       if (Ty.getSizeInBits() > MaximumLegalStoreInBits)
         break;
 
-      if (TLI.isTypeLegal(Ty) &&
+      if (TLI.isTypeLegalForMemAccess(Ty) &&
           TLI.canMergeStoresTo(FirstStoreAS, Ty, DAG.getMachineFunction()) &&
           TLI.allowsMemoryAccess(Context, DL, Ty,
                                  *FirstInChain->getMemOperand(), &IsFast) &&
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index 862d2ebc75a6..ceed4c1ffc91 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -17569,7 +17569,8 @@ bool PPCTargetLowering::allowsMisalignedMemoryAccesses(EVT VT,
       return true;
     if (Subtarget.hasVSX()) {
       if (VT != MVT::v2f64 && VT != MVT::v2i64 &&
-          VT != MVT::v4f32 && VT != MVT::v4i32)
+          VT != MVT::v4f32 && VT != MVT::v4i32 &&
+          VT != MVT::v2f32 && VT != MVT::v2i32)
         return false;
     } else {
       return false;
diff --git a/llvm/lib/Target/PowerPC/PPCISelLowering.h b/llvm/lib/Target/PowerPC/PPCISelLowering.h
index 2fa6d45bfe1a..1f0051f8d273 100644
--- a/llvm/lib/Target/PowerPC/PPCISelLowering.h
+++ b/llvm/lib/Target/PowerPC/PPCISelLowering.h
@@ -1101,6 +1101,11 @@ namespace llvm {
     EVT getOptimalMemOpType(const MemOp &Op,
                             const AttributeList &FuncAttributes) const override;
 
+    bool isTypeLegalForMemAccess(EVT VT) const override {
+      bool Ret = TargetLoweringBase::isTypeLegalForMemAccess(VT) || VT == MVT::v2i32 || VT == MVT::v2f32;
+      return Ret;
+    }
+
     /// Is unaligned memory access allowed for the given type, and is it fast
     /// relative to software emulation.
     bool allowsMisalignedMemoryAccesses(

Sure, it produces some vperm's with this test case, but I don't see an issue with that - in most cases that matter, the constant pool loads aren't likely to lead to a lot of cache misses.

Thank you. A good lesson for me to learn how to simplify logic!

tingwang updated this revision to Diff 451048.Aug 8 2022, 10:43 PM
tingwang retitled this revision from [DAGCombiner] Improve tryStoreMergeOfExtracts by using double sized vector type before type legalized to [DAGCombiner] Improve tryStoreMergeOfExtracts to merge stores before type is legalized.
tingwang edited the summary of this revision. (Show Details)

The original approach is too complex, and the same effect can be achieved more simply as Nemanja pointed out.

I'm adopting the whole approach, and added a guard to make sure this is applied only before type is legalized.

RKSimon added inline comments.Aug 9 2022, 2:27 AM
llvm/include/llvm/CodeGen/TargetLowering.h
945

please add doxygen descriptions of the params (B in particular....)

tingwang updated this revision to Diff 451097.Aug 9 2022, 3:57 AM

Update parameter naming and comments.