This is an archive of the discontinued LLVM Phabricator instance.

Differential D20019

[PPC] exploitation of new xscmp*, as well as xsmaxcdp and xsmincdp
Needs ReviewPublic

Authored by syzaara on May 6 2016, 7:36 AM.

Download Raw Diff

Details

Reviewers

wschmidt
cycheng
kbarton
hfinkel
nemanjai
amehsan

Summary

Some background on why this approach was chosen. This was discussed on Hall's Tuesday's call.

The choice of this approach for implementation is the result of how selectcc operation actions are handled. We first look at the operation action for condition code, and then based on that we look at operation action for selectcc. If we could have looked at both together, then we could have avoided expansion of selectcc, for data types that expansion is not needed, and do a simple pattern match in tablegen. Given that this is not what we currently do, we have some fairly complicated patterns reaching to instruction selection. Since these patterns has to be distinguished from the ones that we want to handle inside the PPCIselDAGtoDAG::Select, we practically need to do the full pattern matching in C++ code.

There are other approaches (for example, not expanding selectcc for floating point condition code during Operation Legalization and then adding a custom handler for vector and int data types). The reason that this approach was not taken is this: Selectcc handling is scattered in multiple places in the code. An approach like this, has the risk of breaking existing code for other data types and their corner cases and becoming a large project.

Diff Detail

Event Timeline

I forgot to add tests for the -mattr=-power9-vector. Will add that.

amehsan updated this revision to Diff 56446.May 6 2016, 12:20 PM

nemanjai added inline comments.May 10 2016, 6:21 AM

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
461	Why the restriction to double precision values? The ISA document mentions this instruction can be used for both single and double precision operands.
lib/Target/PowerPC/PPCInstrVSX.td
2069	Patterns for f32?
test/CodeGen/PowerPC/vsx-p9.ll
3	And if you add f32 above, test cases for float as well.

amehsan added inline comments.May 10 2016, 1:50 PM

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
461	Thanks. I missed the foot note, and I had totally forgot the fact that single precisions are represented in double precision format when stored in a register (talking about scalar operations), so I didn't make the conclusion myself.

I will post another review for exploitation of all instructions. That will subsume this one. The change is ready, I need to double check all cases are covered in the tests (there are more than 40 cases) and do some final clean up of the code.

amehsan updated this revision to Diff 56980.May 11 2016, 4:24 PM

amehsan retitled this revision from [PPC] initial exploitation of xs[min,max]cdp to [PPC] exploitation of new xscmp*, as well as xsmaxcdp and xsmincdp.

amehsan updated this object.

amehsan added inline comments.May 11 2016, 5:41 PM

lib/Target/PowerPC/PPCInstrVSX.td
1924–1930	I will add f32 to here. For other opcodes, codegen is done from within C++ code, and that handles both data types.

amehsan added inline comments.May 11 2016, 9:17 PM

lib/Target/PowerPC/PPCISelLowering.cpp
6241–6244 ↗	(On Diff #56980)	There are advantage and disadvantages in using fsel when one of the operands is zero. Please let me know if you have any comment here.

cycheng added inline comments.May 18 2016, 7:45 AM

test/CodeGen/PowerPC/vsx-p9.ll
3	Need define: target triple = "powerpc64-unknown-linux-gnu" or: llc -mtriple=powerpc64le-unknown-linux-gnu

nemanjai added inline comments.May 18 2016, 7:58 AM

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
462	Would it be appropriate to add an assert that N's opcode is correct (in case in the future this function is called from elsewhere)?

amehsan added inline comments.May 18 2016, 8:03 AM

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
462	Makes sense. Will do.

nemanjai added inline comments.May 18 2016, 9:33 AM

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
245	Perhaps a short comment describing what the purpose of this struct is.
487	Although the fall-through seems reasonable here, I think it's a good idea to add comments to that end. I'm not sure if everyone will agree with me though. So maybe others can chime in here as well.
517	Is it impossible that these operands do not exist? Namely, is it not possible that operand 1 of N does not have 3 operands thereby causing this call to assert for trying to get an invalid operand? Both here and below.
526	Same comment about fall-through.
test/CodeGen/PowerPC/vsx-p9.ll
3	Yes, the latter please. You should always specify the triple because I think other targets will get the "pwr9 is not a valid CPU for this target" message if you don't.

amehsan added inline comments.May 18 2016, 11:34 AM

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
487	Are you in doubt about functional correctness of fall-through or something else? functional correctness should be covered in the testcases. I will think about this again, to see if there are missing patterns in the testcases.
517	N has an opernad(0) because it is a select_cc. we have checked that N->getOperand(0).getOpcode() is and ISD::AND so it has operand (1). and we have checked that both N->getOperand(0).getOperand(0) and N->getOperand(0).getOperand(1) are SETCC so it has operand 0, 1 and 2.

nemanjai added inline comments.May 18 2016, 1:12 PM

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
487	No, the fall-through paths certainly seem fine. I'm only suggesting that fall-through occurrences in switch statements should be commented to inform the reader that this was intent rather than careless omission. I don't think it's even necessary to justify the fall-through (that should be left to the reader), just something as simple as // fall-through
517	OK, excellent. I just didn't look through the early exit out of this conditional branch in detail. Although that brings me to a point I was initially going to post about the if statement above. I find that a descriptive comment for such involved conditional statements is invaluable. Overall, it might be nice for this function to have a comment at the top describing all the kinds of DAGs it handles. I understand that we don't comment every possible DAG combine, but when the logic is not easy and straightforward to follow by reading the code, I find a descriptive comment goes a long way for readability.

amehsan added inline comments.May 18 2016, 1:16 PM

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
517	Sure, will add more comments to the code.

Group1 Testcases:
define {double|float} @{max|min}_test{1|2}{_float}(%x, %y) #1
define {double|float} @{max|min}_test{1|2}{_float}_eq(%x, %y) #1
define {double|float} @fast_{max|min}_test{1|2}{_float}(%x, %y) #2
define {double|float} @fast_{max|min}_test{1|2}{_float}_eq(%x, %y) #2
Total: 8*4 = 32

Group2 Testcases:
define {double|float} @fast_{double|float}_{ugt|ult|ogt|olt|uge|ule|oge|ole}(%x, %y, %a, %b) #1
define {double|float} @nan_{double|float}_{ugt|ult|ogt|olt|uge|ule|oge|ole}(%x, %y, %a, %b) #2
Total: 16*2 = 32

Group3 Testcases:
define double @{one|oeq}_test{_fast}(%x, %y) {#1|#2}
Total: 4

The prefix 'fast_' for functions is because of #1 {"no-nans-fp-math"="true"} or #2 {"no-nans-fp-math"="false"}? because the naming rule is a little bit different between Group1 and Group2.
In Group2, how about unifying function naming when data type is double? I.e. omit "double" in function name when data type is double, as your 1st and 3rd test group naming rule.

Some other observations of test cases, just for reference

test cases for this statement:
  } else if (N->getOperand(0).getValueType() == MVT::i1) {
    ..
  }

define {float|double} @fast_{min|max}_test{1|2}{_float}_eq(%x, %y) #2
define {float|double} @nan_{float|double}_{ugt|ult|oge|ole}(%x, %y, %a, %b) #2
define double @one_test(double %x, double %y) #2

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
514	Looks like we want to handle this pattern: t23: {f32\|f64} = select_cc t20, Constant:i1<0>, t8, t6, setne:ch t20: i1 = and t17, t19 t17: i1 = setcc t4, t2, setXXX:ch t19: i1 = setcc t4, t2, seto:ch
2756	I feel it is a little bit strange, if useP9VSXScalarComparisonInstr returns true, it should mean we can use p9vsx instructions for N, but actually even if the function returns true, we still have some cases that can't use p9vsx instructions. Would it be better if: if (useP9VSXScalarComparisonInstr(N, Summary)) { if (Summary.CC == ISD::CondCode::SETNE) { return ..; } if (Summary.Comp0 == Summary.Ret0 && ..) { return ..; } if (Summary.Comp0 == Summary.Ret1 && ..) { return ..; } if (CurDAG->getTarget().Options.NoNaNsFPMath) { return ..; } llvm_unreachable(..); } So useP9VSXScalarComparisonInstr might need additional arguments to help it judge if N is able to use p9vsx instructions.
test/CodeGen/PowerPC/vsx-p9.ll
493	Do we need the 'fast' flag when we have "no-nans-fp-math"="true" attribute?
538	ugt -> ogt?
659	ugt -> ogt?
779	uge -> oge?
900	uge -> oge ?

Thanks CY for the comments on the test cases. I will review to make sure all right combinations are there and function names are consistent. The fast flag on fcmp IR instruction does not matter. We don't pay any attention to it when we construct Selection DAG. It is in my IR because they come from an example that was compiled with -ffast-math.

One of the outstanding issues in SelectionDAGs is that many ISD opcodes are not equipped with fast math flags. For a limited number of opcodes this issue has been fixed, but the work should be extended to all opcodes. While that is the better way of checking for fast math flags, it is a different issue than the what the current patch tries to achieve.

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
2756	Good point. I will rename the function. It makes more sense to check for non-nan outside the function. So even when the function return true, there is a possibility that we do not want to use the new instructions.

amehsan updated this revision to Diff 58634.May 26 2016, 9:30 AM

amehsan edited edge metadata.

amehsan added inline comments.May 26 2016, 12:11 PM

lib/Target/PowerPC/PPC.td
177 ↗	(On Diff #58634)	I think a feature A should not imply feature B that is a superset of A. I had forgot this point and wrote this code incorrectly, but it seems that "implies" part of a SubtargetFeature has been used incorrectly in features around the new one as well. We probably need to discuss this, to make sure we are all on the same page. In the meantime I will change my code here.

With the exception of a few minor comments, this LGTM.

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
247	Could you please add a brief comment indicating what each field is meant to represent?
461	Please add doxygen-style comments, with a \brief and also the parameters and return values documented.
463	spelling: vairations -> variations
471	Indentation here is off. Is that intentional?
481	Extra blank line here. Please remove.
513	Replace this break with return true, unless there is something else that needs to be done before the return at the end of the function.
521	Replace break with return true.
555	Replace break with return true.
2755	This is not initialized before passing to mayUseP9VSXScalarComparisonInstr. Is the assumption that all fields will be set inside the mayUseP9VSXScalarComparisonInstr?

This revision is now accepted and ready to land.Jul 11 2016, 11:09 AM

amehsan edited edge metadata.Oct 4 2016, 11:30 AM

amehsan added a subscriber: echristo.

A couple of inline comments and one general question: With Nemanjai we've been saying that we didn't want to add subtarget features for every ISA addition of the default ISA, what's with this one? :)

Thanks!

-eric

lib/Target/PowerPC/PPCISelDAGToDAG.cpp
2772–2801	Go ahead and document what's going on in each block here if you wouldn't mind.
lib/Target/PowerPC/PPCISelLowering.cpp
6252 ↗	(On Diff #58634)	Add a simple comment here would be nice.

syzaara commandeered this revision.Feb 2 2018, 8:16 AM

syzaara added a reviewer: amehsan.

syzaara requested review of this revision.Feb 6 2018, 11:38 AM

syzaara updated this revision to Diff 133058.

jedilyn added a subscriber: jedilyn.Jul 26 2018, 6:54 PM

We have neglected this for a very long time. Just adding a comment to trickle it up to the top of the review queue and I plan to review it very soon.

Herald added a subscriber: jsji. · View Herald TranscriptDec 29 2018, 3:28 PM

Revision Contents

Path

Size

lib/

Target/

PowerPC/

PPCISelDAGToDAG.cpp

21 lines

PPCInstrVSX.td

17 lines

test/

CodeGen/

PowerPC/

vsx-p9.ll

57 lines

Diff 56446

lib/Target/PowerPC/PPCISelDAGToDAG.cpp

Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines	private:
SDValue combineToCMPB(SDNode *N);		SDValue combineToCMPB(SDNode *N);
void foldBoolExts(SDValue &Res, SDNode *&N);		void foldBoolExts(SDValue &Res, SDNode *&N);

bool AllUsersSelectZero(SDNode *N);		bool AllUsersSelectZero(SDNode *N);
void SwapAllSelectUsers(SDNode *N);		void SwapAllSelectUsers(SDNode *N);

SDNode transferMemOperands(SDNode N, SDNode *Result);		SDNode transferMemOperands(SDNode N, SDNode *Result);
};		};
}		}
		nemanjaiUnsubmitted Not Done Reply Inline Actions Perhaps a short comment describing what the purpose of this struct is. nemanjai: Perhaps a short comment describing what the purpose of this struct is.

/// InsertVRSaveCode - Once the entire function has been instruction selected,		/// InsertVRSaveCode - Once the entire function has been instruction selected,
		kbartonUnsubmitted Not Done Reply Inline Actions Could you please add a brief comment indicating what each field is meant to represent? kbarton: Could you please add a brief comment indicating what each field is meant to represent?
/// all virtual registers are created and all machine instructions are built,		/// all virtual registers are created and all machine instructions are built,
/// check to see if we need to save/restore VRSAVE. If so, do it.		/// check to see if we need to save/restore VRSAVE. If so, do it.
void PPCDAGToDAGISel::InsertVRSaveCode(MachineFunction &Fn) {		void PPCDAGToDAGISel::InsertVRSaveCode(MachineFunction &Fn) {
// Check to see if this function uses vector registers, which means we have to		// Check to see if this function uses vector registers, which means we have to
// save and restore the VRSAVE register and update it with the regs we use.		// save and restore the VRSAVE register and update it with the regs we use.
//		//
// In this case, there will be virtual registers of vector type created		// In this case, there will be virtual registers of vector type created
// by the scheduler. Detect them now.		// by the scheduler. Detect them now.
▲ Show 20 Lines • Show All 197 Lines • ▼ Show 20 Lines
// isOpcWithIntImmediate - This method tests to see if the node is a specific		// isOpcWithIntImmediate - This method tests to see if the node is a specific
// opcode and that it has a immediate integer right operand.		// opcode and that it has a immediate integer right operand.
// If so Imm will receive the 32 bit value.		// If so Imm will receive the 32 bit value.
static bool isOpcWithIntImmediate(SDNode *N, unsigned Opc, unsigned& Imm) {		static bool isOpcWithIntImmediate(SDNode *N, unsigned Opc, unsigned& Imm) {
return N->getOpcode() == Opc		return N->getOpcode() == Opc
&& isInt32Immediate(N->getOperand(1).getNode(), Imm);		&& isInt32Immediate(N->getOperand(1).getNode(), Imm);
}		}

		static bool isDoubleLegalMinOrMax(SDNode *N) {
		nemanjaiUnsubmitted Not Done Reply Inline Actions Why the restriction to double precision values? The ISA document mentions this instruction can be used for both single and double precision operands. nemanjai: Why the restriction to double precision values? The ISA document mentions this instruction can…
		amehsanUnsubmitted Not Done Reply Inline Actions Thanks. I missed the foot note, and I had totally forgot the fact that single precisions are represented in double precision format when stored in a register (talking about scalar operations), so I didn't make the conclusion myself. amehsan: Thanks. I missed the foot note, and I had totally forgot the fact that single precisions are…
		kbartonUnsubmitted Not Done Reply Inline Actions Please add doxygen-style comments, with a \brief and also the parameters and return values documented. kbarton: Please add doxygen-style comments, with a \brief and also the parameters and return values…

		nemanjaiUnsubmitted Not Done Reply Inline Actions Would it be appropriate to add an assert that N's opcode is correct (in case in the future this function is called from elsewhere)? nemanjai: Would it be appropriate to add an assert that N's opcode is correct (in case in the future this…
		amehsanUnsubmitted Not Done Reply Inline Actions Makes sense. Will do. amehsan: Makes sense. Will do.
		if (N->getValueType(0) != MVT::f64)
		kbartonUnsubmitted Not Done Reply Inline Actions spelling: vairations -> variations kbarton: spelling: vairations -> variations
		return false;

		ISD::CondCode CC = cast<CondCodeSDNode>(N->getOperand(4))->get();

		if (CC != ISD::CondCode::SETOGT && CC != ISD::CondCode::SETOLT)
		return false;

		if (!((N->getOperand(0) == N->getOperand(2) &&
		kbartonUnsubmitted Not Done Reply Inline Actions Indentation here is off. Is that intentional? kbarton: Indentation here is off. Is that intentional?
		N->getOperand(1) == N->getOperand(3)) \|\|
		(N->getOperand(0) == N->getOperand(3) &&
		N->getOperand(1) == N->getOperand(2))))
		return false;

		return true;
		}

SDNode PPCDAGToDAGISel::getFrameIndex(SDNode SN, SDNode *N, unsigned Offset) {		SDNode PPCDAGToDAGISel::getFrameIndex(SDNode SN, SDNode *N, unsigned Offset) {
SDLoc dl(SN);		SDLoc dl(SN);
		kbartonUnsubmitted Not Done Reply Inline Actions Extra blank line here. Please remove. kbarton: Extra blank line here. Please remove.
int FI = cast<FrameIndexSDNode>(N)->getIndex();		int FI = cast<FrameIndexSDNode>(N)->getIndex();
SDValue TFI = CurDAG->getTargetFrameIndex(FI, N->getValueType(0));		SDValue TFI = CurDAG->getTargetFrameIndex(FI, N->getValueType(0));
unsigned Opc = N->getValueType(0) == MVT::i32 ? PPC::ADDI : PPC::ADDI8;		unsigned Opc = N->getValueType(0) == MVT::i32 ? PPC::ADDI : PPC::ADDI8;
if (SN->hasOneUse())		if (SN->hasOneUse())
return CurDAG->SelectNodeTo(SN, Opc, N->getValueType(0), TFI,		return CurDAG->SelectNodeTo(SN, Opc, N->getValueType(0), TFI,
getSmallIPtrImm(Offset, dl));		getSmallIPtrImm(Offset, dl));
		nemanjaiUnsubmitted Not Done Reply Inline Actions Although the fall-through seems reasonable here, I think it's a good idea to add comments to that end. I'm not sure if everyone will agree with me though. So maybe others can chime in here as well. nemanjai: Although the fall-through seems reasonable here, I think it's a good idea to add comments to…
		amehsanUnsubmitted Not Done Reply Inline Actions Are you in doubt about functional correctness of fall-through or something else? functional correctness should be covered in the testcases. I will think about this again, to see if there are missing patterns in the testcases. amehsan: Are you in doubt about functional correctness of fall-through or something else? functional…
		nemanjaiUnsubmitted Not Done Reply Inline Actions No, the fall-through paths certainly seem fine. I'm only suggesting that fall-through occurrences in switch statements should be commented to inform the reader that this was intent rather than careless omission. I don't think it's even necessary to justify the fall-through (that should be left to the reader), just something as simple as // fall-through nemanjai: No, the fall-through paths certainly seem fine. I'm only suggesting that fall-through…
return CurDAG->getMachineNode(Opc, dl, N->getValueType(0), TFI,		return CurDAG->getMachineNode(Opc, dl, N->getValueType(0), TFI,
getSmallIPtrImm(Offset, dl));		getSmallIPtrImm(Offset, dl));
}		}

bool PPCDAGToDAGISel::isRotateAndMask(SDNode *N, unsigned Mask,		bool PPCDAGToDAGISel::isRotateAndMask(SDNode *N, unsigned Mask,
bool isShiftMask, unsigned &SH,		bool isShiftMask, unsigned &SH,
unsigned &MB, unsigned &ME) {		unsigned &MB, unsigned &ME) {
// Don't even go down this path for i64, since different logic will be		// Don't even go down this path for i64, since different logic will be
Show All 9 Lines	if (N->getNumOperands() != 2 \|\|
return false;		return false;

if (Opcode == ISD::SHL) {		if (Opcode == ISD::SHL) {
// apply shift left to mask if it comes first		// apply shift left to mask if it comes first
if (isShiftMask) Mask = Mask << Shift;		if (isShiftMask) Mask = Mask << Shift;
// determine which bits are made indeterminant by shift		// determine which bits are made indeterminant by shift
Indeterminant = ~(0xFFFFFFFFu << Shift);		Indeterminant = ~(0xFFFFFFFFu << Shift);
} else if (Opcode == ISD::SRL) {		} else if (Opcode == ISD::SRL) {
// apply shift right to mask if it comes first		// apply shift right to mask if it comes first
		kbartonUnsubmitted Not Done Reply Inline Actions Replace this break with return true, unless there is something else that needs to be done before the return at the end of the function. kbarton: Replace this break with return true, unless there is something else that needs to be done…
if (isShiftMask) Mask = Mask >> Shift;		if (isShiftMask) Mask = Mask >> Shift;
		cychengUnsubmitted Not Done Reply Inline Actions Looks like we want to handle this pattern: t23: {f32\|f64} = select_cc t20, Constant:i1<0>, t8, t6, setne:ch t20: i1 = and t17, t19 t17: i1 = setcc t4, t2, setXXX:ch t19: i1 = setcc t4, t2, seto:ch cycheng: Looks like we want to handle this pattern: ``` t23: {f32\|f64} = select_cc t20, Constant:i1<0>…
// determine which bits are made indeterminant by shift		// determine which bits are made indeterminant by shift
Indeterminant = ~(0xFFFFFFFFu >> Shift);		Indeterminant = ~(0xFFFFFFFFu >> Shift);
// adjust for the left rotate		// adjust for the left rotate
		nemanjaiUnsubmitted Not Done Reply Inline Actions Is it impossible that these operands do not exist? Namely, is it not possible that operand 1 of N does not have 3 operands thereby causing this call to assert for trying to get an invalid operand? Both here and below. nemanjai: Is it impossible that these operands do not exist? Namely, is it not possible that operand 1 of…
		amehsanUnsubmitted Not Done Reply Inline Actions N has an opernad(0) because it is a select_cc. we have checked that N->getOperand(0).getOpcode() is and ISD::AND so it has operand (1). and we have checked that both N->getOperand(0).getOperand(0) and N->getOperand(0).getOperand(1) are SETCC so it has operand 0, 1 and 2. amehsan: N has an opernad(0) because it is a select_cc. we have checked that N->getOperand(0).getOpcode…
		nemanjaiUnsubmitted Not Done Reply Inline Actions OK, excellent. I just didn't look through the early exit out of this conditional branch in detail. Although that brings me to a point I was initially going to post about the if statement above. I find that a descriptive comment for such involved conditional statements is invaluable. Overall, it might be nice for this function to have a comment at the top describing all the kinds of DAGs it handles. I understand that we don't comment every possible DAG combine, but when the logic is not easy and straightforward to follow by reading the code, I find a descriptive comment goes a long way for readability. nemanjai: OK, excellent. I just didn't look through the early exit out of this conditional branch in…
		amehsanUnsubmitted Not Done Reply Inline Actions Sure, will add more comments to the code. amehsan: Sure, will add more comments to the code.
Shift = 32 - Shift;		Shift = 32 - Shift;
} else if (Opcode == ISD::ROTL) {		} else if (Opcode == ISD::ROTL) {
Indeterminant = 0;		Indeterminant = 0;
} else {		} else {
		kbartonUnsubmitted Not Done Reply Inline Actions Replace break with return true. kbarton: Replace break with return true.
return false;		return false;
}		}

// if the mask doesn't intersect any Indeterminant bits		// if the mask doesn't intersect any Indeterminant bits
if (Mask && !(Mask & Indeterminant)) {		if (Mask && !(Mask & Indeterminant)) {
		nemanjaiUnsubmitted Not Done Reply Inline Actions Same comment about fall-through. nemanjai: Same comment about fall-through.
SH = Shift & 31;		SH = Shift & 31;
// make sure the mask is still a mask (wrap arounds may not be)		// make sure the mask is still a mask (wrap arounds may not be)
return isRunOfOnes(Mask, MB, ME);		return isRunOfOnes(Mask, MB, ME);
}		}
return false;		return false;
}		}

/// SelectBitfieldInsert - turn an or of two masked values into		/// SelectBitfieldInsert - turn an or of two masked values into
Show All 12 Lines	SDNode PPCDAGToDAGISel::SelectBitfieldInsert(SDNode N) {

if ((TargetMask \| InsertMask) == 0xFFFFFFFF) {		if ((TargetMask \| InsertMask) == 0xFFFFFFFF) {
unsigned Op0Opc = Op0.getOpcode();		unsigned Op0Opc = Op0.getOpcode();
unsigned Op1Opc = Op1.getOpcode();		unsigned Op1Opc = Op1.getOpcode();
unsigned Value, SH = 0;		unsigned Value, SH = 0;
TargetMask = ~TargetMask;		TargetMask = ~TargetMask;
InsertMask = ~InsertMask;		InsertMask = ~InsertMask;

// If the LHS has a foldable shift and the RHS does not, then swap it to the		// If the LHS has a foldable shift and the RHS does not, then swap it to the
		kbartonUnsubmitted Not Done Reply Inline Actions Replace break with return true. kbarton: Replace break with return true.
// RHS so that we can fold the shift into the insert.		// RHS so that we can fold the shift into the insert.
if (Op0Opc == ISD::AND && Op1Opc == ISD::AND) {		if (Op0Opc == ISD::AND && Op1Opc == ISD::AND) {
if (Op0.getOperand(0).getOpcode() == ISD::SHL \|\|		if (Op0.getOperand(0).getOpcode() == ISD::SHL \|\|
Op0.getOperand(0).getOpcode() == ISD::SRL) {		Op0.getOperand(0).getOpcode() == ISD::SRL) {
if (Op1.getOperand(0).getOpcode() != ISD::SHL &&		if (Op1.getOperand(0).getOpcode() != ISD::SHL &&
Op1.getOperand(0).getOpcode() != ISD::SRL) {		Op1.getOperand(0).getOpcode() != ISD::SRL) {
std::swap(Op0, Op1);		std::swap(Op0, Op1);
std::swap(Op0Opc, Op1Opc);		std::swap(Op0Opc, Op1Opc);
▲ Show 20 Lines • Show All 2,181 Lines • ▼ Show 20 Lines	SDValue SRIdxVal =
CurDAG->getTargetConstant(N->getOpcode() == PPCISD::ANDIo_1_EQ_BIT ?		CurDAG->getTargetConstant(N->getOpcode() == PPCISD::ANDIo_1_EQ_BIT ?
PPC::sub_eq : PPC::sub_gt, dl, MVT::i32);		PPC::sub_eq : PPC::sub_gt, dl, MVT::i32);

return CurDAG->SelectNodeTo(N, TargetOpcode::EXTRACT_SUBREG, MVT::i1,		return CurDAG->SelectNodeTo(N, TargetOpcode::EXTRACT_SUBREG, MVT::i1,
CR0Reg, SRIdxVal,		CR0Reg, SRIdxVal,
SDValue(AndI.getNode(), 1) /* glue */);		SDValue(AndI.getNode(), 1) /* glue */);
}		}
case ISD::SELECT_CC: {		case ISD::SELECT_CC: {
		if (PPCSubTarget->hasP9Vector() && isDoubleLegalMinOrMax(N))
		break;
ISD::CondCode CC = cast<CondCodeSDNode>(N->getOperand(4))->get();		ISD::CondCode CC = cast<CondCodeSDNode>(N->getOperand(4))->get();
		kbartonUnsubmitted Not Done Reply Inline Actions This is not initialized before passing to mayUseP9VSXScalarComparisonInstr. Is the assumption that all fields will be set inside the mayUseP9VSXScalarComparisonInstr? kbarton: This is not initialized before passing to mayUseP9VSXScalarComparisonInstr. Is the assumption…
EVT PtrVT =		EVT PtrVT =
		cychengUnsubmitted Not Done Reply Inline Actions I feel it is a little bit strange, if useP9VSXScalarComparisonInstr returns true, it should mean we can use p9vsx instructions for N, but actually even if the function returns true, we still have some cases that can't use p9vsx instructions. Would it be better if: if (useP9VSXScalarComparisonInstr(N, Summary)) { if (Summary.CC == ISD::CondCode::SETNE) { return ..; } if (Summary.Comp0 == Summary.Ret0 && ..) { return ..; } if (Summary.Comp0 == Summary.Ret1 && ..) { return ..; } if (CurDAG->getTarget().Options.NoNaNsFPMath) { return ..; } llvm_unreachable(..); } So useP9VSXScalarComparisonInstr might need additional arguments to help it judge if N is able to use p9vsx instructions. cycheng: I feel it is a little bit strange, if useP9VSXScalarComparisonInstr returns true, it should…
		amehsanUnsubmitted Not Done Reply Inline Actions Good point. I will rename the function. It makes more sense to check for non-nan outside the function. So even when the function return true, there is a possibility that we do not want to use the new instructions. amehsan: Good point. I will rename the function. It makes more sense to check for non-nan outside the…
CurDAG->getTargetLoweringInfo().getPointerTy(CurDAG->getDataLayout());		CurDAG->getTargetLoweringInfo().getPointerTy(CurDAG->getDataLayout());
bool isPPC64 = (PtrVT == MVT::i64);		bool isPPC64 = (PtrVT == MVT::i64);

// If this is a select of i1 operands, we'll pattern match it.		// If this is a select of i1 operands, we'll pattern match it.
if (PPCSubTarget->useCRBits() &&		if (PPCSubTarget->useCRBits() &&
N->getOperand(0).getValueType() == MVT::i1)		N->getOperand(0).getValueType() == MVT::i1)
break;		break;

// Handle the setcc cases here. select_cc lhs, 0, 1, 0, cc		// Handle the setcc cases here. select_cc lhs, 0, 1, 0, cc
if (!isPPC64)		if (!isPPC64)
if (ConstantSDNode *N1C = dyn_cast<ConstantSDNode>(N->getOperand(1)))		if (ConstantSDNode *N1C = dyn_cast<ConstantSDNode>(N->getOperand(1)))
if (ConstantSDNode *N2C = dyn_cast<ConstantSDNode>(N->getOperand(2)))		if (ConstantSDNode *N2C = dyn_cast<ConstantSDNode>(N->getOperand(2)))
if (ConstantSDNode *N3C = dyn_cast<ConstantSDNode>(N->getOperand(3)))		if (ConstantSDNode *N3C = dyn_cast<ConstantSDNode>(N->getOperand(3)))
if (N1C->isNullValue() && N3C->isNullValue() &&		if (N1C->isNullValue() && N3C->isNullValue() &&
N2C->getZExtValue() == 1ULL && CC == ISD::SETNE &&		N2C->getZExtValue() == 1ULL && CC == ISD::SETNE &&
// FIXME: Implement this optzn for PPC64.		// FIXME: Implement this optzn for PPC64.
N->getValueType(0) == MVT::i32) {		N->getValueType(0) == MVT::i32) {
SDNode *Tmp =		SDNode *Tmp =
CurDAG->getMachineNode(PPC::ADDIC, dl, MVT::i32, MVT::Glue,		CurDAG->getMachineNode(PPC::ADDIC, dl, MVT::i32, MVT::Glue,
N->getOperand(0), getI32Imm(~0U, dl));		N->getOperand(0), getI32Imm(~0U, dl));
return CurDAG->SelectNodeTo(N, PPC::SUBFE, MVT::i32,		return CurDAG->SelectNodeTo(N, PPC::SUBFE, MVT::i32,
SDValue(Tmp, 0), N->getOperand(0),		SDValue(Tmp, 0), N->getOperand(0),
SDValue(Tmp, 1));		SDValue(Tmp, 1));
}		}

SDValue CCReg = SelectCC(N->getOperand(0), N->getOperand(1), CC, dl);		SDValue CCReg = SelectCC(N->getOperand(0), N->getOperand(1), CC, dl);

if (N->getValueType(0) == MVT::i1) {		if (N->getValueType(0) == MVT::i1) {
// An i1 select is: (c & t) \| (!c & f).		// An i1 select is: (c & t) \| (!c & f).
bool Inv;		bool Inv;
unsigned Idx = getCRIdxForSetCC(CC, Inv);		unsigned Idx = getCRIdxForSetCC(CC, Inv);

unsigned SRI;		unsigned SRI;
switch (Idx) {		switch (Idx) {
default: llvm_unreachable("Invalid CC index");		default: llvm_unreachable("Invalid CC index");
case 0: SRI = PPC::sub_lt; break;		case 0: SRI = PPC::sub_lt; break;
case 1: SRI = PPC::sub_gt; break;		case 1: SRI = PPC::sub_gt; break;
case 2: SRI = PPC::sub_eq; break;		case 2: SRI = PPC::sub_eq; break;
case 3: SRI = PPC::sub_un; break;		case 3: SRI = PPC::sub_un; break;
}		}

SDValue CCBit = CurDAG->getTargetExtractSubreg(SRI, dl, MVT::i1, CCReg);		SDValue CCBit = CurDAG->getTargetExtractSubreg(SRI, dl, MVT::i1, CCReg);

SDValue NotCCBit(CurDAG->getMachineNode(PPC::CRNOR, dl, MVT::i1,		SDValue NotCCBit(CurDAG->getMachineNode(PPC::CRNOR, dl, MVT::i1,
CCBit, CCBit), 0);		CCBit, CCBit), 0);
		echristoUnsubmitted Not Done Reply Inline Actions Go ahead and document what's going on in each block here if you wouldn't mind. echristo: Go ahead and document what's going on in each block here if you wouldn't mind.
SDValue C = Inv ? NotCCBit : CCBit,		SDValue C = Inv ? NotCCBit : CCBit,
NotC = Inv ? CCBit : NotCCBit;		NotC = Inv ? CCBit : NotCCBit;

SDValue CAndT(CurDAG->getMachineNode(PPC::CRAND, dl, MVT::i1,		SDValue CAndT(CurDAG->getMachineNode(PPC::CRAND, dl, MVT::i1,
C, N->getOperand(2)), 0);		C, N->getOperand(2)), 0);
SDValue NotCAndF(CurDAG->getMachineNode(PPC::CRAND, dl, MVT::i1,		SDValue NotCAndF(CurDAG->getMachineNode(PPC::CRAND, dl, MVT::i1,
NotC, N->getOperand(3)), 0);		NotC, N->getOperand(3)), 0);

▲ Show 20 Lines • Show All 1,599 Lines • Show Last 20 Lines

lib/Target/PowerPC/PPCInstrVSX.td

Show First 20 Lines • Show All 1,915 Lines • ▼ Show 20 Lines	let Predicates = [HasP9Vector] in {
def XSCMPEQDP : XX3_XT5_XA5_XB5<60, 3, "xscmpeqdp", vsrc, vsfrc, vsfrc,		def XSCMPEQDP : XX3_XT5_XA5_XB5<60, 3, "xscmpeqdp", vsrc, vsfrc, vsfrc,
IIC_FPCompare, []>;		IIC_FPCompare, []>;
def XSCMPGEDP : XX3_XT5_XA5_XB5<60, 19, "xscmpgedp", vsrc, vsfrc, vsfrc,		def XSCMPGEDP : XX3_XT5_XA5_XB5<60, 19, "xscmpgedp", vsrc, vsfrc, vsfrc,
IIC_FPCompare, []>;		IIC_FPCompare, []>;
def XSCMPGTDP : XX3_XT5_XA5_XB5<60, 11, "xscmpgtdp", vsrc, vsfrc, vsfrc,		def XSCMPGTDP : XX3_XT5_XA5_XB5<60, 11, "xscmpgtdp", vsrc, vsfrc, vsfrc,
IIC_FPCompare, []>;		IIC_FPCompare, []>;
def XSCMPNEDP : XX3_XT5_XA5_XB5<60, 27, "xscmpnedp", vsrc, vsfrc, vsfrc,		def XSCMPNEDP : XX3_XT5_XA5_XB5<60, 27, "xscmpnedp", vsrc, vsfrc, vsfrc,
IIC_FPCompare, []>;		IIC_FPCompare, []>;
// Vector Compare Not Equal		// Vector Compare Not Equal
def XVCMPNEDP : XX3Form_Rc<60, 123,		def XVCMPNEDP : XX3Form_Rc<60, 123,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),
"xvcmpnedp $XT, $XA, $XB", IIC_VecFPCompare, []>;		"xvcmpnedp $XT, $XA, $XB", IIC_VecFPCompare, []>;
let Defs = [CR6] in		let Defs = [CR6] in
def XVCMPNEDPo : XX3Form_Rc<60, 123,		def XVCMPNEDPo : XX3Form_Rc<60, 123,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),
		amehsanUnsubmitted Not Done Reply Inline Actions I will add f32 to here. For other opcodes, codegen is done from within C++ code, and that handles both data types. amehsan: I will add f32 to here. For other opcodes, codegen is done from within C++ code, and that…
"xvcmpnedp. $XT, $XA, $XB", IIC_VecFPCompare, []>,		"xvcmpnedp. $XT, $XA, $XB", IIC_VecFPCompare, []>,
isDOT;		isDOT;
def XVCMPNESP : XX3Form_Rc<60, 91,		def XVCMPNESP : XX3Form_Rc<60, 91,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),
"xvcmpnesp $XT, $XA, $XB", IIC_VecFPCompare, []>;		"xvcmpnesp $XT, $XA, $XB", IIC_VecFPCompare, []>;
let Defs = [CR6] in		let Defs = [CR6] in
def XVCMPNESPo : XX3Form_Rc<60, 91,		def XVCMPNESPo : XX3Form_Rc<60, 91,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),		(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	def XVTSTDCSP : XX2_RD6_DCMX7_RS6<60, 13, 5,
"xvtstdcsp $XT, $XB, $DCMX", IIC_VecFP, []>;		"xvtstdcsp $XT, $XB, $DCMX", IIC_VecFP, []>;
def XVTSTDCDP : XX2_RD6_DCMX7_RS6<60, 15, 5,		def XVTSTDCDP : XX2_RD6_DCMX7_RS6<60, 15, 5,
(outs vsrc:$XT), (ins u7imm:$DCMX, vsrc:$XB),		(outs vsrc:$XT), (ins u7imm:$DCMX, vsrc:$XB),
"xvtstdcdp $XT, $XB, $DCMX", IIC_VecFP, []>;		"xvtstdcdp $XT, $XB, $DCMX", IIC_VecFP, []>;

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

// Maximum/Minimum Type-C/Type-J DP		// Maximum/Minimum Type-C/Type-J DP
// XT.dword[1] = 0xUUUU_UUUU_UUUU_UUUU, so we use vsrc for XT		def XSMAXCDP : XX3_XT5_XA5_XB5<60, 128, "xsmaxcdp", vsfrc, vsfrc, vsfrc,
def XSMAXCDP : XX3_XT5_XA5_XB5<60, 128, "xsmaxcdp", vsrc, vsfrc, vsfrc,		IIC_VecFP, [(set f64:$XT,
IIC_VecFP, []>;		(selectcc f64:$XA, f64:$XB,
		f64:$XA, f64:$XB, SETOGT))]>;
def XSMAXJDP : XX3_XT5_XA5_XB5<60, 144, "xsmaxjdp", vsrc, vsfrc, vsfrc,		def XSMAXJDP : XX3_XT5_XA5_XB5<60, 144, "xsmaxjdp", vsrc, vsfrc, vsfrc,
IIC_VecFP, []>;		IIC_VecFP, []>;
def XSMINCDP : XX3_XT5_XA5_XB5<60, 136, "xsmincdp", vsrc, vsfrc, vsfrc,		def XSMINCDP : XX3_XT5_XA5_XB5<60, 136, "xsmincdp", vsfrc, vsfrc, vsfrc,
IIC_VecFP, []>;		IIC_VecFP, [(set f64:$XT,
		(selectcc f64:$XA, f64:$XB,
		f64:$XA, f64:$XB, SETOLT))]>;
def XSMINJDP : XX3_XT5_XA5_XB5<60, 152, "xsminjdp", vsrc, vsfrc, vsfrc,		def XSMINJDP : XX3_XT5_XA5_XB5<60, 152, "xsminjdp", vsrc, vsfrc, vsfrc,
IIC_VecFP, []>;		IIC_VecFP, []>;

		def : Pat <(f64 (selectcc f64:$XB, f64:$XA, f64:$XA, f64:$XB, SETOLT)),
		nemanjaiUnsubmitted Not Done Reply Inline Actions Patterns for f32? nemanjai: Patterns for f32?
		(XSMAXCDP $XA, $XB)>;
		def : Pat <(f64 (selectcc f64:$XB, f64:$XA, f64:$XA, f64:$XB, SETOGT)),
		(XSMINCDP $XA, $XB)>;
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

// Vector Byte-Reverse H/W/D/Q Word		// Vector Byte-Reverse H/W/D/Q Word
def XXBRH : XX2_XT6_XO5_XB6<60, 7, 475, "xxbrh", vsrc, []>;		def XXBRH : XX2_XT6_XO5_XB6<60, 7, 475, "xxbrh", vsrc, []>;
def XXBRW : XX2_XT6_XO5_XB6<60, 15, 475, "xxbrw", vsrc, []>;		def XXBRW : XX2_XT6_XO5_XB6<60, 15, 475, "xxbrw", vsrc, []>;
def XXBRD : XX2_XT6_XO5_XB6<60, 23, 475, "xxbrd", vsrc, []>;		def XXBRD : XX2_XT6_XO5_XB6<60, 23, 475, "xxbrd", vsrc, []>;
def XXBRQ : XX2_XT6_XO5_XB6<60, 31, 475, "xxbrq", vsrc, []>;		def XXBRQ : XX2_XT6_XO5_XB6<60, 31, 475, "xxbrq", vsrc, []>;

▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/vsx-p9.ll

This file was added.

				; RUN: llc -mcpu=pwr9 -mattr=+power9-vector < %s \| FileCheck %s

				define double @max_test1(double %x, double %y) #0 {
				nemanjaiUnsubmitted Not Done Reply Inline Actions And if you add f32 above, test cases for float as well. nemanjai: And if you add f32 above, test cases for float as well.
				cychengUnsubmitted Not Done Reply Inline Actions Need define: target triple = "powerpc64-unknown-linux-gnu" or: llc -mtriple=powerpc64le-unknown-linux-gnu cycheng: Need define: target triple = "powerpc64-unknown-linux-gnu" or: llc -mtriple=powerpc64le…
				nemanjaiUnsubmitted Not Done Reply Inline Actions Yes, the latter please. You should always specify the triple because I think other targets will get the "pwr9 is not a valid CPU for this target" message if you don't. nemanjai: Yes, the latter please. You should always specify the triple because I think other targets will…

				; CHECK-LABEL: @max_test1

				entry:
				%cmp = fcmp ogt double %x, %y
				%x.y = select i1 %cmp, double %x, double %y
				ret double %x.y

				; CHECK: xsmaxcdp 1, 1, 2
				; CHECK: blr

				}

				define double @max_test2(double %x, double %y) #0 {

				; CHECK-LABEL: @max_test2

				entry:
				%cmp = fcmp olt double %x, %y
				%y.x = select i1 %cmp, double %y, double %x
				ret double %y.x

				; CHECK: xsmaxcdp 1, 2, 1
				; CHECK: blr

				}

				define double @min_test1(double %x, double %y) #0 {

				; CHECK-LABEL: @min_test1

				entry:
				%cmp = fcmp ogt double %x, %y
				%y.x = select i1 %cmp, double %y, double %x
				ret double %y.x

				; CHECK: xsmincdp 1, 2, 1
				; CHECK: blr

				}

				define double @min_test2(double %x, double %y) #0 {

				; CHECK-LABEL: @min_test2

				entry:
				%cmp = fcmp olt double %x, %y
				%x.y = select i1 %cmp, double %x, double %y
				ret double %x.y

				; CHECK: xsmincdp 1, 1, 2
				; CHECK: blr

				}
				cychengUnsubmitted Not Done Reply Inline Actions uge -> oge ? cycheng: uge -> oge ?
				cychengUnsubmitted Not Done Reply Inline Actions Do we need the 'fast' flag when we have "no-nans-fp-math"="true" attribute? cycheng: Do we need the 'fast' flag when we have "no-nans-fp-math"="true" attribute?
				cychengUnsubmitted Not Done Reply Inline Actions uge -> oge? cycheng: uge -> oge?
				cychengUnsubmitted Not Done Reply Inline Actions ugt -> ogt? cycheng: ugt -> ogt?
				cychengUnsubmitted Not Done Reply Inline Actions ugt -> ogt? cycheng: ugt -> ogt?