This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Handle unary SelectPatternFlavor for ABS case in SelectionDAGBuilder::visitSelect.
ClosedPublic

Authored by ikulagin on Jul 26 2018, 12:40 AM.

Details

Summary

[SelectionDAG] Handle unary SelectPatternFlavor for ABS case in SelectionDAGBuilder::visitSelect.

These changes are related to PR37743 and include:

  1. SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node.
  2. Delete the redundant recognizer of the integer ABS pattern from the DAGCombiner.
  3. Add promoting the integer ABS node in the LegalizeIntegerType.
  4. Expand-based legalization of integer result for the ABS nodes.
  5. Expand-based legalization of ABS vector operations.
  6. Add some integer abs testcases for different typesizes for Thumb arch
  7. Add the custom ABS expanding and change the SAD pattern recognizer for X86 arch: The i64 result of the ABS is expanded to:
tmp = (SRA, Hi, 31)
Lo = (UADDO tmp, Lo)
Hi = (XOR tmp, (ADDCARRY tmp, hi, Lo:1))
Lo = (XOR tmp, Lo)

The "detectZextAbsDiff" function is changed for the recognition of pattern
with the ABS node. Given a ABS node, detect the following pattern:
(ABS (SUB (ZERO_EXTEND a), (ZERO_EXTEND b))).

  1. Change integer abs testcases for codegen with the ABS node support for AArch64.
    • Indicate that the ABS is legal for the i64 type when the NEON is supported.
    • Change the integer abs testcases to show changing of codegen.
  2. Add combine and legalization of ABS nodes for Thumb arch.
  3. Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition.

For discussion, see https://bugs.llvm.org/show_bug.cgi?id=37743

Diff Detail

Event Timeline

ikulagin created this revision.Jul 26 2018, 12:40 AM

Note: this change was mentioned in D48754. That patch was split into several pieces, and I think that's the last planned improvement for IR abs() detection/canonicalization.

The IR pattern produced by D48754 is already matched into ISD::ABS? Or is there a third differential?

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
1388–1390 ↗(On Diff #157427)

Inconsistent with the rest.
Unless the final line is longer than 80-chars, make it one line?

The IR pattern produced by D48754 is already matched into ISD::ABS? Or is there a third differential?

This patch produces ISD::ABS when ValueTracking recognizes the pattern from D48754.

But, what DAG should be produced when ValueTraking recognizes the SPF_NABS pattern?
There is no NABS type for DAG nodes. I propose the following (SUB 0, (ABS X)).

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
1388–1390 ↗(On Diff #157427)

The code was formatted by clang-format.
Should I reformat it like the rest?

Please merge your code with the patch in D48754 and run related tests to check there is no failure.

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2135 ↗(On Diff #157427)

add some test for this function?

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
927 ↗(On Diff #157427)

same as above, need some testcases for this code change

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2918 ↗(On Diff #157427)

I am a little confused for IsUnary since abs is not the only unary node. changing to one related with abs?

2981 ↗(On Diff #157427)

I think matchSelectionPattern can only find integer abs for now. Maybe we should only consider ISD::ABS for now and add ISD::FABS when matchSelectionPattern supports fabs?

spatel added inline comments.Jul 27 2018, 4:21 AM
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2981 ↗(On Diff #157427)

This was discussed in PR37743. We already have an llvm.fabs intrinsic, so I don't see why matchSelectPattern() would ever need to match fabs().

matchSelectPattern() is used for common operations that do *not* have LLVM intrinsics, so they are implemented using 'select' IR.

The fabs code here should be removed from this patch.

2985 ↗(On Diff #157427)

I think SPF_NABS will be as described in the summary - sub(0, abs(X)).

Unless there is evidence that the negation gets separated from the ABS dag node and we're failing to produce 'nabs' codegen for targets that support that op? In that case, we might add a ISD::NABS node.

Either way, let's leave nabs for a follow-up patch.

Please merge your code with the patch in D48754 and run related tests to check there is no failure.

I merged my code with the latest commits and run tests for X86 arch. For X86 everything is OK.
Now, I plan to make tests for other architectures and test them.
Or should I leave this for follow-up patches?

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2135 ↗(On Diff #157427)

I think for this function test cases are architecture-specific. For example, the X86 tests contain the iabs.ll tests with few types.

lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
927 ↗(On Diff #157427)

I think, tests are arсhitecture-specific too.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2918 ↗(On Diff #157427)

Yes, but the SelectPatternFlavor contains only ABS/NABS patterns relating to unary node. Should I rename it to IsUnaryAbs?

2981 ↗(On Diff #157427)

Ok, I'll remove it.

2985 ↗(On Diff #157427)

Ok, I'll implement this in another patch later.

Please merge your code with the patch in D48754 and run related tests to check there is no failure.

I merged my code with the latest commits and run tests for X86 arch. For X86 everything is OK.
Now, I plan to make tests for other architectures and test them.
Or should I leave this for follow-up patches?

I strongly suggest you run tests for all arch for this patch since your code does not only impact X86.
When I did this task by myself, I found there is some case need to promote ISD::ABS for some type, because I hit assert "Do not know how to promote this operator!". You should also meets this issue if you run check for other arch.
Below is one case for your reference:

: 'RUN: at line 1'; /home/shchenz/llvm/llvm/llvm/build/bin/llc -verify-machineinstrs < /home/shchenz/llvm/llvm/llvm/test/CodeGen/PowerPC/2008-07-15-Bswap.ll

And I have another comment, maybe you can split this patch into two parts:
Patch 1: contain first three parts in your description

SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node.
Expand-based legalization of integer result for the ABS nodes.
Expand-based legalization of ABS vector operations.

Patch 2: handle X86 specific

What do you think?

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2135 ↗(On Diff #157427)

Sorry I don't understand what you mean by test cases are architecture-specific?
Yes, I see there are some cases for abs already in X86. But these cases pass without this patch. IMO, ideally there should be test cases for every patch with functionality changing to show what's current patch's improvement or fixing compared with trunk.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
2918 ↗(On Diff #157427)

Yes, it would be more clear.

And I have another comment, maybe you can split this patch into two parts:
Patch 1: contain first three parts in your description

SelectionDAGBuilder::visitSelect handles the unary SelectPatternFlavor::SPF_ABS case to build ABS node.
Expand-based legalization of integer result for the ABS nodes.
Expand-based legalization of ABS vector operations.
Patch 2: handle X86 specific

What do you think?

It make sense. But we must take into account that patches should be applied together, because changes from the patch 1 requires the patch 2 for correct testing the x86 and the subsequent patches for the rest architectures.

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2135 ↗(On Diff #157427)

It is impossible to test only "Expand" without account subsequent "DAGCombining" and other DAG optimizations during the instruction selection for different architectures. I think, that the testing of ABS node with the different scalar/vector types for every architecture is better. Otherwise, there will be a lot of similar tests for every architecture that estimate the same thing.

RKSimon added inline comments.Jul 30 2018, 4:56 AM
lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
1388–1390 ↗(On Diff #157427)

Yes please - keep the styling consistent in cases like this (and we don't want to clang-format the whole thing as it makes blaming/diffs tricky).

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
3004 ↗(On Diff #157427)

Shouldn't we only do this if TLI.isOperationLegalOrCustom(Opc, VT) ? That probably removes the need for some of the legalization code from this initial patch.

ikulagin added inline comments.Jul 30 2018, 10:38 AM
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
3004 ↗(On Diff #157427)

Yes, I think so too, It removes the need for some legalization, but in this case for the absolute value code the DAGBuilder can produce 2 different DAGs (one of which is the select node and the other is the ABS node). This will lead to the more complex DAGCombiner, because the DAGCombiner should support the same patterns with different DAGs. For example, the X86 recognizer of SAD pattern will be needed to handle the case based on the SELECT node and the case based on the ABS node. I think that this makes code bloat with the redundant supported patterns.
Also, the DAG based on the SELECT has 3 nodes (SELECT, SUB, SETCC) and on the ABS only 1.

I have gone deeper into this task and I have some thoughts.
What if we make the DAGBuilder and the Legalization for the ABS more general? I mean that we can always generate the ABS and on the Legalization stage we can perform a simple transformation, but on the Combine stage we can produce the optimal architecture-specific DAG with more complexity transformations.
I think it will allow building a DAG with a smaller number of nodes and performing a legalization of a smaller number of nodes too.
What do you think about this?

ikulagin updated this revision to Diff 158522.Aug 1 2018, 7:01 AM
ikulagin marked 11 inline comments as done.Aug 1 2018, 7:01 AM
ikulagin edited the summary of this revision. (Show Details)

I have excluded the X86 SAD pattern recognizer from this patch. It will be in another diff.
Now, I am implementing the target-depended ABS handling and the test cases for it.
I will create a new diff for the target-depended handling + test cases over this patch.

ikulagin updated this revision to Diff 158965.Aug 3 2018, 3:36 AM
ikulagin retitled this revision from [SelectionDAG][X86] Handle unary SelectPatternFlavor for ABS case in SelectionDAGBuilder::visitSelect. to [SelectionDAG] Handle unary SelectPatternFlavor for ABS case in SelectionDAGBuilder::visitSelect..
ikulagin edited the summary of this revision. (Show Details)

Rebase the patch

craig.topper added inline comments.Aug 10 2018, 3:42 PM
lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2147 ↗(On Diff #158965)

Isn't this wrong if the value of high is 0x80000000? The absolute value of that would be 0x80000000 which would cause us to leave the low bits unchanged. But if the full value is 0x80000000_00000001 then the absolute value should be 0x7fffffff_ffffffff.

@ikulagin You might want to look at https://reviews.llvm.org/D26357 which was my old attempt to add an Intrinsic::abs IR intrinsic - you might be able to reuse some of the legalizers from that?

@ikulagin You might want to look at https://reviews.llvm.org/D26357 which was my old attempt to add an Intrinsic::abs IR intrinsic - you might be able to reuse some of the legalizers from that?

Thank you for that differential. I'll use some of the legalizers.

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
2147 ↗(On Diff #158965)

I'm sorry for the late response. Yes, this is wrong. I left out the case when the low/high parts contain values that are greater/smaller than the MAX/MIN for the given number of bits. I will redo it.

ikulagin updated this revision to Diff 163980.Sep 5 2018, 1:49 AM

Rebase and fix expand-based legalization of integer result for the ABS nodes.

RKSimon added inline comments.Sep 12 2018, 5:06 AM
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2596 ↗(On Diff #163980)

Safer to use getScalaraSizeInBits - in case vector types ever use this.

RKSimon added inline comments.Sep 13 2018, 1:55 AM
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
939 ↗(On Diff #163980)

Unless these ops are legal/custom (or promote for xor) on the vector type I think we need to unroll.

@ikulagin Are you still working on this? If not would you mind if I commandeer and complete this?

Are you still working on this?

Yes I am. I would like to do this. I have just waited for the feedback. If it is an urgent task I would like to continue my work.

RKSimon added inline comments.Oct 26 2018, 1:41 AM
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2599 ↗(On Diff #163980)

You might want to consider creating just TargetLowering::ExpandABS and getting both SelectionDAGLegalize/VectorLegalizer to call it - see the other TargetLowering::Expand* functions for examples.

Changes in this patch require the changes in target specific parts (AArch64, X86, etc), otherwise
some tests will fail. Should I include these target specific changes into the current differential or into a separate one?
Now they are in separate differential.

Changes in this patch require the changes in target specific parts (AArch64, X86, etc), otherwise
some tests will fail. Should I include these target specific changes into the current differential or into a separate one?
Now they are in separate differential.

Better to merge them all into this patch - reviewers will be able to advise if anything can be cherry picked out then.

ikulagin marked 3 inline comments as done.Oct 27 2018, 5:49 AM
ikulagin updated this revision to Diff 171439.Oct 28 2018, 1:47 PM
ikulagin edited the summary of this revision. (Show Details)

Rebase and merge changes to one patch.

Well, I have to do changes for NVPTX backend, because the idioms.ll test failed due to the matchSelectPattern doesn't recognize the ABS pattern for that case.
(this test uses icmp sge but matchSelectPattern recognize only ICMP_SGT and ICMP_SLT cases).

Please pay attention to how I supported Arch and Thumb.
I wait feedback to correct this if it is need.

ikulagin updated this revision to Diff 171631.Oct 29 2018, 8:32 PM
ikulagin edited the summary of this revision. (Show Details)

Extend 'matchSelectPattern' to recognize the ABS patterns with ICMP_SGE condition.

RKSimon added inline comments.Oct 30 2018, 3:31 AM
lib/CodeGen/SelectionDAG/TargetLowering.cpp
4458 ↗(On Diff #171631)

The shift constant needs to use a getShiftAmountTy type

lib/Target/ARM/ARMISelLowering.cpp
1210 ↗(On Diff #171631)

Don't include commented out code

10309 ↗(On Diff #171631)

Just call TargetLowering::expandABS ?

lib/Target/X86/X86ISelLowering.cpp
26030 ↗(On Diff #171631)

Seems a shame this (and the ARM) code can't be put into the legalizers

test/CodeGen/Thumb/iabs.ll
76 ↗(On Diff #171631)

Add these new tests to trunk now so the patch shows the codegen diff

spatel added inline comments.Oct 30 2018, 8:41 AM
lib/Analysis/ValueTracking.cpp
4856 ↗(On Diff #171631)

This is saying that:
X >= -1 ? X : -X
...is an abs(). But that's not correct:
https://rise4fun.com/Alive/eVzs

This diff should be split into its own patch and include correctness tests. Preferably, the tests can go in:
llvm/unittests/Analysis/ValueTrackingTest.cpp

ikulagin added inline comments.Oct 30 2018, 12:44 PM
lib/Target/ARM/ARMISelLowering.cpp
1210 ↗(On Diff #171631)

Sorry for that, I did not notice.

lib/Target/X86/X86ISelLowering.cpp
26030 ↗(On Diff #171631)

Could you explain in more detail, or give some advice, please?

test/CodeGen/Thumb/iabs.ll
76 ↗(On Diff #171631)

It is already exist, I have created diff early -- D52138.

ikulagin added inline comments.Oct 30 2018, 1:30 PM
lib/Analysis/ValueTracking.cpp
4856 ↗(On Diff #171631)

Yes, for that pattern we need something like follow:

// (X >= 0) ? X : -X or (X >= 1) ? X : -x --> ABS(X)
if (Pred == ICmpInst::ICMP_SGE && match(CmpRHS, ZeroOrOne))
  return {SPF_ABS, SPNB_NA, false};

@ikulagin are you still looking at this?

I am waiting for feedback and I want to coordinate the activity.

@ikulagin are you still looking at this?

I am waiting for feedback and I want to coordinate the activity.

AFAICT you have outstanding fixes that you promised to complete?

@ikulagin Please can you rebase now that rL350998 has landed?

@ikulagin I'm keen to get this finished - do you have time to look at this or can I commandeer please?

@ikulagin I'm keen to get this finished - do you have time to look at this or can I commandeer please?

I have a outstanding few changes. I am planning rebase it and send patch by this Saturday. If the changes are needed earlier you can continue this task. What do you think?

SGTM - thanks for the update

ikulagin updated this revision to Diff 184908.Feb 2 2019, 12:11 PM

Rebase and some fixes.

There is a need to determine ways to legalize ABS for PowerPC. Now it fails on the tests ppc64-P9-vabsd.ll pre-inc-disable.ll.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 2 2019, 12:11 PM

Rebase and some fixes.

There is a need to determine ways to legalize ABS for PowerPC. Now it fails on the tests ppc64-P9-vabsd.ll pre-inc-disable.ll.

Please can you include the current test change so the ppc guys can comment more easily

@inouehrs may also be able to advise - IIRC he was investigating powerpc absolutes at some point (for SAD - D46101).

ikulagin added a comment.EditedFeb 4 2019, 11:13 PM

@RKSimon I forgot to clarify, the PPC tests crashed on ppc64-P9-vabsd.ll: lsub_absv_16_ext and pre-inc-disable.ll: test_pre_inc_disable_1, because now the PPC uses custom lowering the ISD::ABS node for vector types,

for (MVT VT : MVT::vector_valuetypes()) {
  // add/sub are legal for all supported vector VT's.
  setOperationAction(ISD::ADD, VT, Legal);
  setOperationAction(ISD::SUB, VT, Legal);
  setOperationAction(ISD::ABS, VT, Custom);

but the custom lowering for this architecture is not specified.

ikulagin updated this revision to Diff 185244.Feb 4 2019, 11:21 PM

Rebase and several changes in Thumb test iabs.ll.

ikulagin updated this revision to Diff 188678.Feb 27 2019, 10:53 PM

Rebase and separate testcases for Thumb arch from this patch to D52138.

Do you have any offers to support ABS legalizing for PowerPC?

I will apply this patch and take a look at what breaks on PPC. I'll advise soon on what is needed for us.

OK, it appears that the PPC issue is really just a phase ordering problem that is quite easy to resolve. We now get the ISD::ABS node earlier so we may very well end up in a situation where we need to legalize the result type of such node. So our Custom legalization of all vector types can't really work any longer. Applying this in PPCISelLowering.cpp should fix the UNREACHABLE failure:

@@ -551,7 +552,9 @@ PPCTargetLowering::PPCTargetLowering(const PPCTarg
       // add/sub are legal for all supported vector VT's.
       setOperationAction(ISD::ADD, VT, Legal);
       setOperationAction(ISD::SUB, VT, Legal);
-      setOperationAction(ISD::ABS, VT, Custom);
+      // Custom lowering for legal vector result types.
+      if (VT.getSizeInBits() == 128)
+        setOperationAction(ISD::ABS, VT, Custom);

       // Vector instructions introduced in P8
       if (Subtarget.hasP8Altivec() && (VT.SimpleTy != MVT::v1i128)) {
ikulagin updated this revision to Diff 190509.Mar 13 2019, 2:29 PM

Applied @nemanjai's fixes and rebased.

RKSimon added a subscriber: efriedma.
RKSimon added inline comments.
llvm/test/CodeGen/AArch64/iabs.ll
46

@efriedma IIRC we hit this before?

efriedma added inline comments.Mar 14 2019, 4:23 PM
llvm/test/CodeGen/AArch64/iabs.ll
46

Yes; we intentionally marked "abs" as Expand for i64 to avoid this sort of sequence, since it's much higher latency. Not sure why this patch is reversing that decision.

ikulagin marked 3 inline comments as done.Mar 15 2019, 12:01 AM
ikulagin added inline comments.
llvm/test/CodeGen/AArch64/iabs.ll
46

it is, indeed, so. Latency 3 (abs) + 2*3(fmov) vs 1 (cneg) + 1(cmp). Fixed.

ikulagin updated this revision to Diff 190787.Mar 15 2019, 12:05 AM
ikulagin marked an inline comment as done.

Reverted legalization of ABS for i64 for AArch64.

RKSimon added inline comments.Mar 15 2019, 5:00 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
554–555

@nemanjai Use VT.is128BitVector() ? Worth adding as a pre-commit now?

RKSimon accepted this revision.Mar 17 2019, 11:37 AM

I think this can go in now. The PPC ABS fix could go in as pre-commit but its not vital.

bogner accepted this revision.Mar 18 2019, 4:58 PM
This revision is now accepted and ready to land.Mar 18 2019, 4:58 PM

@ikulagin Do you have commit access ?

No I don't. Could you commit it, please?

This revision was automatically updated to reflect the committed changes.

This commit breaks the following IR:

target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
target triple = "wasm32-unknown-unknown"

define void @foo() local_unnamed_addr {
entry:
  %x0 = load <4 x i8>, <4 x i8>* undef, align 1
  %0 = load <4 x i8>, <4 x i8>* null, align 1
  %t6 = srem <4 x i8> %0, %x0
  %1 = ashr <4 x i8> %t6, <i8 7, i8 7, i8 7, i8 7>
  %2 = icmp slt <4 x i8> %x0, zeroinitializer
  %3 = sub <4 x i8> zeroinitializer, %x0
  %4 = select <4 x i1> %2, <4 x i8> %3, <4 x i8> %x0
  %5 = and <4 x i8> %1, %4
  %6 = add <4 x i8> %5, %t6
  store <4 x i8> %6, <4 x i8>* undef, align 1
  ret void
}

Running llc on that IR gives the following error:

ScalarizeVectorResult #0: t28: v1i8 = abs t26

LLVM ERROR: Do not know how to scalarize the result of this operator!

This was reported at https://bugs.llvm.org/show_bug.cgi?id=41149.

This commit also broke compilation of some files for me, for ARM, where compilation now doesn't finish (files that normally compile within a second now doesn't finish after half an hour. See https://bugs.llvm.org/show_bug.cgi?id=41160 for details and a reproduction sample.

This commit also broke compilation of some files for me, for ARM, where compilation now doesn't finish (files that normally compile within a second now doesn't finish after half an hour. See https://bugs.llvm.org/show_bug.cgi?id=41160 for details and a reproduction sample.

DAG combining loops forever after legalize types. I am looking for reason of it.

It would appear that the fix for PR 41149 should be fairly straightforward. The vector type legalizer just needs to learn about ISD::ABS. Likely just another case to be added in ScalarizeVectorResult().
PR 41160 is a common pitfall with DAG combines. I've gone ahead and reduced the test case provided in the PR.