Page MenuHomePhabricator

[DAGCombiner] Merge consecutive stores of vector elements before types are legalized
AbandonedPublic

Authored by lkail on Jun 4 2019, 10:40 PM.

Details

Summary

Since some targets support vector shuffling, we can merge more stores before types are legalized.

Diff Detail

Event Timeline

lkail created this revision.Jun 4 2019, 10:40 PM
lkail edited the summary of this revision. (Show Details)Jun 4 2019, 10:44 PM
lkail retitled this revision from Merge consecutive stores of vector elements before types are legalized to [PowerPC] Merge consecutive stores of vector elements before types are legalized.Jun 4 2019, 10:49 PM
niravd added a comment.Jun 6 2019, 9:09 AM

This seems like it should be folded into the already existing checks. Do you know why NumStoresToMerge was not being before. I expect it's the requirement of a legal types in pre-legal merges or PPC's check for allowed misaligned accesses missing some cases. If it's the former I suspect we can disable the legality requirement prelegaltypes for non-truncated stores (replace TLI.isTypeLegal(ty) with isTypeLegal(ty)).

niravd added a subscriber: fhahn.Jun 7 2019, 11:34 AM
lkail added a comment.EditedJun 9 2019, 10:44 PM

This seems like it should be folded into the already existing checks. Do you know why NumStoresToMerge was not being before. I expect it's the requirement of a legal types in pre-legal merges or PPC's check for allowed misaligned accesses missing some cases. If it's the former I suspect we can disable the legality requirement prelegaltypes for non-truncated stores (replace TLI.isTypeLegal(ty) with isTypeLegal(ty)).

Thanks for responding, @niravd. I currently have no idea of why NumStoresToMerge was not being before. I might have a look at patches related to this portion of code. I notice that consecutive stores of vector elements was first introduced by https://reviews.llvm.org/rL224611 in which a legal type was a requirement already. I'll have a try of what you said.

lkail added a comment.Jun 17 2019, 1:27 AM

This seems like it should be folded into the already existing checks. Do you know why NumStoresToMerge was not being before. I expect it's the requirement of a legal types in pre-legal merges or PPC's check for allowed misaligned accesses missing some cases. If it's the former I suspect we can disable the legality requirement prelegaltypes for non-truncated stores (replace TLI.isTypeLegal(ty) with isTypeLegal(ty)).

Hi, @niravd, after investigate code carefully, I think this check might not be redundant. Considering the case, we have 3 i32 values extracted from vectors, both isTypeLegal and TIL.isTypeLegal see a v3i32 illegal. However, MergeStoresOfConstantsOrVecElts which is called later by MergeConsecutiveStores doesn't require type legality and builds a BUILD_VECTOR node whose elements are 3 EXTRACT_VECTOR_ELT values. PowerPC's vector type legalizer can handle such cases, so it can benefit from getNumStoresOfVectorElementsToMergePreLegalize. I know it's quite weird such check added within a context where type legality check is around. I once wanna implement it in PPCTargetLowering::PerformDAGCombine, however it might make code duplicated.

Hi, @niravd, after investigate code carefully, I think this check might not be redundant. Considering the case, we have 3 i32 values extracted from vectors, both isTypeLegal and TIL.isTypeLegal see a v3i32 illegal.

Are you certain isTypeLegal was returning false in prelegalization? isTypeLegal(x) should be "!LegalTypes || TLI.isTypeLegal(x)". I was expecting that if you swapped out TLI.isTypeLegal for isTypeLegal where it was failing, we could generate an invalid node, but it sounds like that's not the case. If so, I think we should double check.

However, MergeStoresOfConstantsOrVecElts which is called later by MergeConsecutiveStores doesn't require type legality and builds a BUILD_VECTOR node whose elements are 3 EXTRACT_VECTOR_ELT values. PowerPC's vector type legalizer can handle such cases, so it can benefit from getNumStoresOfVectorElementsToMergePreLegalize. I know it's quite weird such check added within a context where type legality check is around. I once wanna implement it in PPCTargetLowering::PerformDAGCombine, however it might make code duplicated.

lkail added a comment.Jun 18 2019, 7:17 PM

@niravd Thanks for pointing out my mistake. I tried swapping out TIL.isTypeLegal with isTypeLegal, code generated for PowerPC is not what I expect, cuz TIL.allowsMemoryAccess will return false if Ty is something like v3i32. Also this change will break some regression tests of SystemZ and X86. And I don't quite understand the meaning of 'double check' here, could you please explain more?

By double check, I just meant to look at the results again with isTypeLegal checked, which is where we are?

FWIW, It's probably fine to do something like isTypeLegal with allowsMemoryAccess, though there will likely be more changes in other backends,. I expect most to be mundane. It may be worth it to update and see if others are motivated to look into real regressions.

@niravd Thanks for pointing out my mistake. I tried swapping out TIL.isTypeLegal with isTypeLegal, code generated for PowerPC is not what I expect, cuz TIL.allowsMemoryAccess will return false if Ty is something like v3i32. Also this change will break some regression tests of SystemZ and X86. And I don't quite understand the meaning of 'double check' here, could you please explain more?

lkail updated this revision to Diff 205740.Jun 19 2019, 11:26 PM
lkail retitled this revision from [PowerPC] Merge consecutive stores of vector elements before types are legalized to [DAGCombiner] Merge consecutive stores of vector elements before types are legalized.
lkail edited the summary of this revision. (Show Details)
lkail added a comment.EditedJun 19 2019, 11:30 PM

Updated the patch. @niravd any further suggestions?

lkail edited reviewers, added: craig.topper, RKSimon, uweigand; removed: MaskRay.Jun 19 2019, 11:34 PM

Looks like it's mostly an improvement though there are some potential regressions around vector shuffles. I'll leave it to the others if it's acceptable to land.

Also, what happened to the PPC test changes?

The X86 changes LGTM

lkail added a comment.Jun 20 2019, 6:29 PM

Also, what happened to the PPC test changes?

Cuz PPC's allowsMemoryAccess lacks information about which CombineLevel is at, it fails the check(Only a few vector types are allowed to have misaligned access). As a result, currently no changes happen in PPC's code. I might try to solve this problem with another patch.

lkail added a comment.Jun 23 2019, 6:37 PM

Hi @jonpa, could you have a look if it is a real reg in SystemZ's change?

Cuz PPC's allowsMemoryAccess lacks information about which CombineLevel is at, it fails the check(Only a few vector types are allowed to have misaligned access). As a result, currently no changes happen in PPC's code. I might try to solve this problem with another patch.

Ah. Can you solve this by doing the analog to isTypeLegal (vs. isTypeLegal) and assume it's true prelegalization? If it's that small, you should (but don't feel like you must) fold it into this patch.

Hi @jonpa, could you have a look if it is a real reg in SystemZ's change?

At a first glance this seems to be generating worse code now since we need to do all those permute-type instructions in order to get the bytes into the correct order in a single register to store ... I'll clarify with the hardware folks which of the sequences would actually be preferable.

If it turns out that there are instances where *not* merging stores (even when it would be *possible*) is not preferred from a performance perspective, should common code use some cost function here?

lkail added a comment.EditedJun 25 2019, 1:26 AM

Considering suggestions of @niravd and @uweigand , is it proper to have an implementation like

bool DAGCombiner::isAbleToMergeConsecutiveStoresPreLegalize(ArrayRef<SNode*> elements, .../* Params related to align, addrspace and etc.*/) {
  if (LegalTypes)
    return false;
  // Let target decides cost considering elements to be stored.
  return TLI.canMergeConsecutiveStoresOfVectorElements(elements, ...);
}

And new check is

if (isTypeLegal(Ty) &&
    TLI.canMergeStoresTo(FirstStoreAS, Ty, DAG) &&
    ((TLI.allowsMemoryAccess(Context, DL, Ty,
                           *FirstInChain->getMemOperand(), &IsFast) &&
      IsFast) || isAbleToMergeConsecutiveStoresPreLegalize(elements, ...))

It's reasonable assuming we want to prohibit the SystemZ cases. Note that TLI.canMergeConsecutiveStoresOfVectorElements and TLI.canMergeStoresTo are both only called here and should be merged into a single method.

That said, the permute expression seems like a sign that there are some permutation peepholes may be worthwhile. We could simplify the permute into a BUILD_VECTOR of a vector stores which should be close enough to the original element-wise stores here.

Considering suggestions of @niravd and @uweigand , is it proper to have an implementation like

bool DAGCombiner::isAbleToMergeConsecutiveStoresPreLegalize(ArrayRef<SNode*> elements, .../* Params related to align, addrspace and etc.*/) {
  if (LegalTypes)
    return false;
  // Let target decides cost considering elements to be stored.
  return TLI.canMergeConsecutiveStoresOfVectorElements(elements, ...);
}

And new check is

if (isTypeLegal(Ty) &&
    TLI.canMergeStoresTo(FirstStoreAS, Ty, DAG) &&
    ((TLI.allowsMemoryAccess(Context, DL, Ty,
                           *FirstInChain->getMemOperand(), &IsFast) &&
      IsFast) || isAbleToMergeConsecutiveStoresPreLegalize(elements, ...))

@lkail Are you still looking at this please?

lkail added a comment.Aug 19 2019, 6:26 PM

Hi @RKSimon , currently I'm not working on it. I should have abandoned this patch. I might have another patch which also works for PowerPC.

lkail abandoned this revision.Aug 19 2019, 6:28 PM

Since this patch has long been not updated, I'll close it and plan another patch that also works for PowerPC.