Page MenuHomePhabricator

[PPC] exploitation of new xscmp*, as well as xsmaxcdp and xsmincdp
Needs ReviewPublic

Authored by syzaara on May 6 2016, 7:36 AM.



Some background on why this approach was chosen. This was discussed on Hall's Tuesday's call.

The choice of this approach for implementation is the result of how selectcc operation actions are handled. We first look at the operation action for condition code, and then based on that we look at operation action for selectcc. If we could have looked at both together, then we could have avoided expansion of selectcc, for data types that expansion is not needed, and do a simple pattern match in tablegen. Given that this is not what we currently do, we have some fairly complicated patterns reaching to instruction selection. Since these patterns has to be distinguished from the ones that we want to handle inside the PPCIselDAGtoDAG::Select, we practically need to do the full pattern matching in C++ code.

There are other approaches (for example, not expanding selectcc for floating point condition code during Operation Legalization and then adding a custom handler for vector and int data types). The reason that this approach was not taken is this: Selectcc handling is scattered in multiple places in the code. An approach like this, has the risk of breaking existing code for other data types and their corner cases and becoming a large project.

Diff Detail

Event Timeline

amehsan updated this revision to Diff 56414.EditedMay 6 2016, 7:36 AM
amehsan retitled this revision from to [PPC] initial exploitation of xs[min,max]cdp.
amehsan updated this object.
amehsan added a subscriber: llvm-commits.

I forgot to add tests for the -mattr=-power9-vector. Will add that.

amehsan updated this revision to Diff 56446.May 6 2016, 12:20 PM
nemanjai added inline comments.May 10 2016, 6:21 AM

Why the restriction to double precision values? The ISA document mentions this instruction can be used for both single and double precision operands.


Patterns for f32?


And if you add f32 above, test cases for float as well.

amehsan added inline comments.May 10 2016, 1:50 PM

Thanks. I missed the foot note, and I had totally forgot the fact that single precisions are represented in double precision format when stored in a register (talking about scalar operations), so I didn't make the conclusion myself.

I will post another review for exploitation of all instructions. That will subsume this one. The change is ready, I need to double check all cases are covered in the tests (there are more than 40 cases) and do some final clean up of the code.

amehsan updated this revision to Diff 56980.May 11 2016, 4:24 PM
amehsan retitled this revision from [PPC] initial exploitation of xs[min,max]cdp to [PPC] exploitation of new xscmp*, as well as xsmaxcdp and xsmincdp.
amehsan updated this object.
amehsan added inline comments.May 11 2016, 5:41 PM

I will add f32 to here. For other opcodes, codegen is done from within C++ code, and that handles both data types.

amehsan added inline comments.May 11 2016, 9:17 PM
6241–6244 ↗(On Diff #56980)

There are advantage and disadvantages in using fsel when one of the operands is zero. Please let me know if you have any comment here.

cycheng added inline comments.May 18 2016, 7:45 AM

Need define:

target triple = "powerpc64-unknown-linux-gnu"


llc -mtriple=powerpc64le-unknown-linux-gnu
nemanjai added inline comments.May 18 2016, 7:58 AM

Would it be appropriate to add an assert that N's opcode is correct (in case in the future this function is called from elsewhere)?

amehsan added inline comments.May 18 2016, 8:03 AM

Makes sense. Will do.

nemanjai added inline comments.May 18 2016, 9:33 AM

Perhaps a short comment describing what the purpose of this struct is.


Although the fall-through seems reasonable here, I think it's a good idea to add comments to that end. I'm not sure if everyone will agree with me though. So maybe others can chime in here as well.


Is it impossible that these operands do not exist? Namely, is it not possible that operand 1 of N does not have 3 operands thereby causing this call to assert for trying to get an invalid operand? Both here and below.


Same comment about fall-through.


Yes, the latter please. You should always specify the triple because I think other targets will get the "pwr9 is not a valid CPU for this target" message if you don't.

amehsan added inline comments.May 18 2016, 11:34 AM

Are you in doubt about functional correctness of fall-through or something else? functional correctness should be covered in the testcases. I will think about this again, to see if there are missing patterns in the testcases.


N has an opernad(0) because it is a select_cc. we have checked that N->getOperand(0).getOpcode() is and ISD::AND so it has operand (1). and we have checked that both N->getOperand(0).getOperand(0) and N->getOperand(0).getOperand(1) are SETCC so it has operand 0, 1 and 2.

nemanjai added inline comments.May 18 2016, 1:12 PM

No, the fall-through paths certainly seem fine. I'm only suggesting that fall-through occurrences in switch statements should be commented to inform the reader that this was intent rather than careless omission. I don't think it's even necessary to justify the fall-through (that should be left to the reader), just something as simple as

// fall-through

OK, excellent. I just didn't look through the early exit out of this conditional branch in detail. Although that brings me to a point I was initially going to post about the if statement above. I find that a descriptive comment for such involved conditional statements is invaluable.

Overall, it might be nice for this function to have a comment at the top describing all the kinds of DAGs it handles. I understand that we don't comment every possible DAG combine, but when the logic is not easy and straightforward to follow by reading the code, I find a descriptive comment goes a long way for readability.

amehsan added inline comments.May 18 2016, 1:16 PM

Sure, will add more comments to the code.

cycheng edited edge metadata.May 19 2016, 2:54 AM
Group1 Testcases:
define {double|float} @{max|min}_test{1|2}{_float}(%x, %y) #1
define {double|float} @{max|min}_test{1|2}{_float}_eq(%x, %y) #1
define {double|float} @fast_{max|min}_test{1|2}{_float}(%x, %y) #2
define {double|float} @fast_{max|min}_test{1|2}{_float}_eq(%x, %y) #2
Total: 8*4 = 32

Group2 Testcases:
define {double|float} @fast_{double|float}_{ugt|ult|ogt|olt|uge|ule|oge|ole}(%x, %y, %a, %b) #1
define {double|float} @nan_{double|float}_{ugt|ult|ogt|olt|uge|ule|oge|ole}(%x, %y, %a, %b) #2
Total: 16*2 = 32

Group3 Testcases:
define double @{one|oeq}_test{_fast}(%x, %y) {#1|#2}
Total: 4
  • The prefix 'fast_' for functions is because of #1 {"no-nans-fp-math"="true"} or #2 {"no-nans-fp-math"="false"}? because the naming rule is a little bit different between Group1 and Group2.
  • In Group2, how about unifying function naming when data type is double? I.e. omit "double" in function name when data type is double, as your 1st and 3rd test group naming rule.

Some other observations of test cases, just for reference

test cases for this statement:
  } else if (N->getOperand(0).getValueType() == MVT::i1) {

define {float|double} @fast_{min|max}_test{1|2}{_float}_eq(%x, %y) #2
define {float|double} @nan_{float|double}_{ugt|ult|oge|ole}(%x, %y, %a, %b) #2
define double @one_test(double %x, double %y) #2

Looks like we want to handle this pattern:

t23: {f32|f64} = select_cc t20, Constant:i1<0>, t8, t6, setne:ch
  t20: i1 = and t17, t19
  t17: i1 = setcc t4, t2, setXXX:ch
  t19: i1 = setcc t4, t2, seto:ch

I feel it is a little bit strange, if useP9VSXScalarComparisonInstr returns true, it should mean we can use p9vsx instructions for N, but actually even if the function returns true, we still have some cases that can't use p9vsx instructions.

Would it be better if:

if (useP9VSXScalarComparisonInstr(N, Summary)) {
  if (Summary.CC == ISD::CondCode::SETNE) { return ..; }
  if (Summary.Comp0 == Summary.Ret0 && ..) { return ..; }
  if (Summary.Comp0 == Summary.Ret1 && ..) { return ..; }
  if (CurDAG->getTarget().Options.NoNaNsFPMath) { return ..; }

So useP9VSXScalarComparisonInstr might need additional arguments to help it judge if N is able to use p9vsx instructions.


Do we need the 'fast' flag when we have "no-nans-fp-math"="true" attribute?


ugt -> ogt?


ugt -> ogt?


uge -> oge?


uge -> oge ?

Thanks CY for the comments on the test cases. I will review to make sure all right combinations are there and function names are consistent. The fast flag on fcmp IR instruction does not matter. We don't pay any attention to it when we construct Selection DAG. It is in my IR because they come from an example that was compiled with -ffast-math.

One of the outstanding issues in SelectionDAGs is that many ISD opcodes are not equipped with fast math flags. For a limited number of opcodes this issue has been fixed, but the work should be extended to all opcodes. While that is the better way of checking for fast math flags, it is a different issue than the what the current patch tries to achieve.


Good point. I will rename the function. It makes more sense to check for non-nan outside the function. So even when the function return true, there is a possibility that we do not want to use the new instructions.

amehsan updated this revision to Diff 58634.May 26 2016, 9:30 AM
amehsan edited edge metadata.
amehsan added inline comments.May 26 2016, 12:11 PM
177 ↗(On Diff #58634)

I think a feature A should not imply feature B that is a superset of A. I had forgot this point and wrote this code incorrectly, but it seems that "implies" part of a SubtargetFeature has been used incorrectly in features around the new one as well. We probably need to discuss this, to make sure we are all on the same page. In the meantime I will change my code here.

kbarton accepted this revision.Jul 11 2016, 11:09 AM
kbarton edited edge metadata.

With the exception of a few minor comments, this LGTM.


Could you please add a brief comment indicating what each field is meant to represent?


Please add doxygen-style comments, with a \brief and also the parameters and return values documented.


spelling: vairations -> variations


Indentation here is off. Is that intentional?


Extra blank line here. Please remove.


Replace this break with return true, unless there is something else that needs to be done before the return at the end of the function.


Replace break with return true.


Replace break with return true.


This is not initialized before passing to mayUseP9VSXScalarComparisonInstr.
Is the assumption that all fields will be set inside the mayUseP9VSXScalarComparisonInstr?

This revision is now accepted and ready to land.Jul 11 2016, 11:09 AM
amehsan edited edge metadata.Oct 4 2016, 11:30 AM
amehsan added a subscriber: echristo.

A couple of inline comments and one general question: With Nemanjai we've been saying that we didn't want to add subtarget features for every ISA addition of the default ISA, what's with this one? :)




Go ahead and document what's going on in each block here if you wouldn't mind.

6252 ↗(On Diff #58634)

Add a simple comment here would be nice.

syzaara commandeered this revision.Feb 2 2018, 8:16 AM
syzaara added a reviewer: amehsan.
syzaara requested review of this revision.Feb 6 2018, 11:38 AM
syzaara updated this revision to Diff 133058.

We have neglected this for a very long time. Just adding a comment to trickle it up to the top of the review queue and I plan to review it very soon.