This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
7/17
PPCISelDAGToDAG.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
2016-04-17-combine.ll
-
Frames-dyn-alloca.ll
-
and-mask.ll
-
cmpb.ll
-
setcc-logic.ll

Differential D71831

[PowerPC] Exploit the rldicl + rldicl when and with mask
ClosedPublic

Authored by steven.zhang on Dec 23 2019, 1:12 AM.

Download Raw Diff

Details

Reviewers

jsji
nemanjai
hfinkel
shchenz

Group Reviewers

Restricted Project

Commits

rG4bd186c0ff76: [PowerPC] Exploit the rldicl + rldicl when and with mask

Summary

If we are and the constant like 0xFFFFFFC00000, for now, we are using several instructions to generate this 48bit constant and final an "and". However, we could exploit it with two rotate instructions.

       MB          ME               MB+63-ME 
+----------------------+     +----------------------+
|0000001111111111111000| ->  |0000000001111111111111|
+----------------------+     +----------------------+
 0                    64      0                    64

Rotate left ME + 1 bit first, and then, mask it with (MB + 63 - ME, 63), finally, rotate back. Notice that, we need to round it with 64 bit for the wrapping case.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

steven.zhang created this revision.Dec 23 2019, 1:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 23 2019, 1:12 AM

Herald added subscribers: • wuzish, kbarton, hiraditya. · View Herald Transcript

steven.zhang added a parent revision: D71829: [PowerPC] Exploit the rlwinm instructions for "and" with constant..Dec 23 2019, 1:13 AM

Rebase the patch. Testing with spec and get about ~4k instructions reduce.

Rebase the patch.

steven.zhang edited parent revisions, added: D72250: [NFC][PowerPC] Refactor the tryAndWithMask(); removed: D71829: [PowerPC] Exploit the rlwinm instructions for "and" with constant..Feb 17 2020, 7:24 PM

steven.zhang updated this revision to Diff 245080.Feb 17 2020, 9:42 PM

steven.zhang added a child revision: D71891: [PowerPC] Exploit the rlwinm + rlwinm when "and" with constant.Feb 17 2020, 9:51 PM

ping

Sorry for the delayed comments.

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
4452	`andis.` in .td handles 0xABCD0000 u32imm, will this makes this kind imm worse?
4464	s/64/63?
4468	ME could be 63, putting 63 + 1 here seems unreasonable.
4475	I personally think the logic for the special case is a little hard to follow. I guess you want to convert the following imm: 0x00ffffffffff00ff, can we first treat it as imm' 0xffffffffffff00ff? and we can call normal case like above, then we get Val = SDValue(CurDAG->getMachineNode(PPC::RLDICL, Loc, MVT::i64, Val, getI64Imm(ME + 1, Loc), getI64Imm((MB + 63 - ME) & 63, Loc)), SDValue Ops[] = {Val, getI64Imm(63 - ME, Loc), getI64Imm(0, Loc)}; CurDAG->SelectNodeTo(N, PPC::RLDICL, MVT::i64, Ops); Notice that the second `rldicl` for above normal case has no clear operation, so its mb is 0, can we add the clear for the highest 8 bits here? Make a call for above normal case and then clear the highest bits in the second `rldicl`?
4481	MB can be calculated after checking ME is valid?
4482	check last bit is 1? Is it too complicated to call countTrailingZeros?
4486	ME is always 63? Why not use 63 directly?
4504	s/32/31, s/64/63

steven.zhang marked 3 inline comments as done.Mar 13 2020, 1:26 AM

steven.zhang added inline comments.

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
4475	No. The pattern is: 0x00fff00ff000. The idea is to calculate the position of [MB1, ME1] for fff, and [MB2, ME2] for ff. Then, use two rotate clear instructions to do the calculation,
4482	The last bit doesn't necessary to be 1. We need to get the number of trailing 0's.
4486	There won't be performance difference between the two. Use ME to make the logic clear.

oops, we indeed could simplify the logic here.

Address comments.

steven.zhang marked 3 inline comments as done.Mar 17 2020, 12:28 AM

steven.zhang added inline comments.

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
4452	No. It will be caught by single rlwinm if it is the pattern that this patch matches.
4475	In fact, both works. The pattern is 0x00fff00fff. Update the patch to make it more clear.
4482	Good point. In fact, we don't need the check here as the isRunOfOnes64 will check it for us.

Harbormaster completed remote builds in B49401: Diff 250700.Mar 17 2020, 1:34 AM

shchenz added inline comments.Mar 19 2020, 8:43 PM

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp

4479

For special case, form a new Imm, like:

APInt Res(64, Imm64);                                                                                                                                
ClearBits = Res.countLeadingZeros();                                                                                                                 
if (ClearBits != 0) {                                                                                                                                
  // change pattern |0001111100000011111111|                                                                                                         
  //                       to |1111111100000011111111|                                                                                                         
  APInt Mask = APInt::getBitsSet(64, 64 - ClearBits, 64);                                                                                            
  Res = Res | Mask;                                                                                                                                  
  Imm64 = Res.getZExtValue();                                                                                                                        
}

And pass new Imm64 and ClearBits to normal case, I think logic here maybe a little simple?

Address comments.

steven.zhang marked an inline comment as done.Mar 22 2020, 8:26 PM

steven.zhang added inline comments.

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
4479	Good suggestion.

Harbormaster completed remote builds in B50057: Diff 251936.Mar 22 2020, 8:57 PM

LGTM. Please hold on some days in case other reviewers have comments.

This revision is now accepted and ready to land.Mar 23 2020, 12:38 AM

I think this would be immensely more readable if you use more descriptive names for variables. It is very hard to get this bit manipulation right in ones head so I really think you should try your best to make this as simple to follow as possible:

You use one mask in on line 4463 and then a different one on 4476. Please don't do this. Stick with a consistent example.
Rename RotateRightClearLeft to something like RightJustifyRangeAndClear as it appears that is what the function is doing.
Get rid of all the expressions involving ME/MB - especially things like <imm> +/- MB/ME as they are very difficult to reason about. For readability, favour defining temporary values just so they would have a name. For example: MB+63-ME is kind of meaningless to a reader. But if you do something like unsigned FirstBitSetWhenRightJustified = MB + 63 - ME; that is now much easier to follow. I realize that we are creating single-use temporaries this way, but I really think it is worth it for readability.

The algorithm appears to be something along the lines of:

if (!MaskIsTwoContiguousRunsOfOnes)
  return
// Add Missing Bits On Left To The Mask
// Right Justify Mask And Clear Bits Formerly In The Middle
// Rotate Back And Clear Bits Previously Added On Left

And I think the comments, function names and variable names should make it clear that this is what is happening.

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
4457	I think it is confusing for a function called `RotateRightClearLeft` to emit a "Rotate Left Clear Left"

In D71831#1936489, @nemanjai wrote:

I think this would be immensely more readable if you use more descriptive names for variables. It is very hard to get this bit manipulation right in ones head so I really think you should try your best to make this as simple to follow as possible:

You use one mask in on line 4463 and then a different one on 4476. Please don't do this. Stick with a consistent example.

Good catch.

Rename RotateRightClearLeft to something like RightJustifyRangeAndClear as it appears that is what the function is doing.

In ISA, the RLDICL is named as: "Rotate Left Doubleword Immediate then Clear Left". I am not sure if the right justify range make it more clear. Regarding to left or right, it is just a wrap rotate. The lambda here is trying to hide the detail that implement the right rotate with left rotate. Personally, I prefer the RotateRightClearLeft one, but also ok to the RightJustifyRangeAndClear if you insist.

Get rid of all the expressions involving ME/MB - especially things like <imm> +/- MB/ME as they are very difficult to reason about. For readability, favour defining temporary values just so they would have a name. For example: MB+63-ME is kind of meaningless to a reader. But if you do something like unsigned FirstBitSetWhenRightJustified = MB + 63 - ME; that is now much easier to follow. I realize that we are creating single-use temporaries this way, but I really think it is worth it for readability.

It is always a balance :) Agree that use the temp variable here as the rotate logic is indeed hard to follow.

The algorithm appears to be something along the lines of:
if (!MaskIsTwoContiguousRunsOfOnes)
  return
// Add Missing Bits On Left To The Mask
// Right Justify Mask And Clear Bits Formerly In The Middle
// Rotate Back And Clear Bits Previously Added On Left
And I think the comments, function names and variable names should make it clear that this is what is happening.

That is nice. Thank you.

Address comments.

Harbormaster completed remote builds in B50208: Diff 252213.Mar 23 2020, 10:17 PM

@nemanjai Any more comments ?

In D71831#1938331, @steven.zhang wrote:

In D71831#1936489, @nemanjai wrote:

Rename RotateRightClearLeft to something like RightJustifyRangeAndClear as it appears that is what the function is doing.

In ISA, the RLDICL is named as: "Rotate Left Doubleword Immediate then Clear Left". I am not sure if the right justify range make it more clear. Regarding to left or right, it is just a wrap rotate. The lambda here is trying to hide the detail that implement the right rotate with left rotate. Personally, I prefer the RotateRightClearLeft one, but also ok to the RightJustifyRangeAndClear if you insist.

I think we can retire this discussion quite easily by just removing the lambda. There isn't a whole lot of point to a lambda that is used only once. If it helped with readability I would be in favour of it, but I think it actually makes the code less readable.

In any case, I described the concise algorithm in pseudo-code not so much because I think it is a useful comment, but because I think that is how the function should flow. For example, isn't something like this much more readable:

bool PPCDAGToDAGISel::tryAsPairOfRLDICL(SDNode *N) {
  assert(N->getOpcode() == ISD::AND && "ISD::AND SDNode expected");
  uint64_t Imm64;

  // Do nothing if it is 16-bit imm as the pattern in the .td file handles
  // it well with "andi.".
  if (!isInt64Immediate(N->getOperand(1).getNode(), Imm64) || isUInt<16>(Imm64))
    return false;

  SDLoc Loc(N);
  SDValue Val = N->getOperand(0);

  // Optimized with two rldicl's as follows:
  // Add missing bits on left to the mask and check that the mask is a
  // wrapped run of ones, i.e.
  // Change pattern |0001111100000011111111|
  //             to |1111111100000011111111|.
  unsigned NumOfLeadingZeros = countLeadingZeros(Imm64);
  if (NumOfLeadingZeros != 0)
    Imm64 |= maskLeadingOnes<uint64_t>(NumOfLeadingZeros);
  unsigned MB, ME;
  if (!isRunOfOnes64(Imm64, MB, ME))
    return false;

  //         ME     MB
  // +----------------------+     +----------------------+
  // |1111111100000011111111| ->  |0000001111111111111111|
  // +----------------------+     +----------------------+
  //  0                    63      0                    63
  // There are ME + 1 ones on the left and (MB - ME + 63) & 63 zeros in between.
  unsigned OnesOnLeft = ME + 1;
  unsigned ZerosInBetween = (MB - ME + 63) & 63;

  // Rotate left by OnesOnLeft (so leading ones are now trailing ones) and clear
  // on the left the bits that are already zeros in the mask.
  Val = SDValue(CurDAG->getMachineNode(PPC::RLDICL, Loc, MVT::i64, Val,
                                       getI64Imm(OnesOnLeft, Loc),
                                       getI64Imm(ZerosInBetween, Loc)),
                0);

  //                                      ME     MB
  // +----------------------+     +----------------------+
  // |0000001111111111111111| ->  |0001111100000011111111|
  // +----------------------+     +----------------------+
  //  0                    63      0                    63
  // Rotate back by 64 - OnesOnLeft to undo previous rotate. Then clear on the
  // left the number of ones we previously added.
  SDValue Ops[] = {Val, getI64Imm(64 - OnesOnLeft, Loc),
                   getI64Imm(NumOfLeadingZeros, Loc)};
  CurDAG->SelectNodeTo(N, PPC::RLDICL, MVT::i64, Ops);
  return true;
}

Note that the above is just a sample of how I think the code should be structured for readability. While I am fairly sure it is semantically equivalent to what you have, I did not carefully verify this.

Make sense. I will update the patch. Thank you for the comments.

Address Nemanjai's comments.

Harbormaster failed remote builds in B53290: Diff 257594!Apr 14 2020, 9:13 PM

I will upstream this patch if no more comments in several days. Thank you for the nice refactor. (@nemanjai )

LGTM. Thanks for your patience. As far as I'm concerned, feel free to commit when you're ready.

Closed by commit rG4bd186c0ff76: [PowerPC] Exploit the rldicl + rldicl when and with mask (authored by steven.zhang). · Explain WhyApr 16 2020, 10:42 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

PowerPC/

PPCISelDAGToDAG.cpp

57 lines

test/

CodeGen/

PowerPC/

2016-04-17-combine.ll

4 lines

18 lines

29 lines

16 lines

4 lines

Diff 258235

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp

Show First 20 Lines • Show All 345 Lines • ▼ Show 20 Lines

private:		private:
bool trySETCC(SDNode *N);		bool trySETCC(SDNode *N);
bool tryAsSingleRLDICL(SDNode *N);		bool tryAsSingleRLDICL(SDNode *N);
bool tryAsSingleRLDICR(SDNode *N);		bool tryAsSingleRLDICR(SDNode *N);
bool tryAsSingleRLWINM(SDNode *N);		bool tryAsSingleRLWINM(SDNode *N);
bool tryAsSingleRLWINM8(SDNode *N);		bool tryAsSingleRLWINM8(SDNode *N);
bool tryAsSingleRLWIMI(SDNode *N);		bool tryAsSingleRLWIMI(SDNode *N);
		bool tryAsPairOfRLDICL(SDNode *N);

void PeepholePPC64();		void PeepholePPC64();
void PeepholePPC64ZExt();		void PeepholePPC64ZExt();
void PeepholeCROps();		void PeepholeCROps();

SDValue combineToCMPB(SDNode *N);		SDValue combineToCMPB(SDNode *N);
void foldBoolExts(SDValue &Res, SDNode *&N);		void foldBoolExts(SDValue &Res, SDNode *&N);

▲ Show 20 Lines • Show All 4,072 Lines • ▼ Show 20 Lines	SDValue Ops[] = {N->getOperand(0), getI64Imm(0, dl), getI64Imm(MB - 32, dl),
getI64Imm(ME - 32, dl)};		getI64Imm(ME - 32, dl)};
CurDAG->SelectNodeTo(N, PPC::RLWINM8, MVT::i64, Ops);		CurDAG->SelectNodeTo(N, PPC::RLWINM8, MVT::i64, Ops);
return true;		return true;
}		}

return false;		return false;
}		}

		bool PPCDAGToDAGISel::tryAsPairOfRLDICL(SDNode *N) {
		assert(N->getOpcode() == ISD::AND && "ISD::AND SDNode expected");
		uint64_t Imm64;
		if (!isInt64Immediate(N->getOperand(1).getNode(), Imm64))
		return false;

		// Do nothing if it is 16-bit imm as the pattern in the .td file handle
		// it well with "andi.".
		if (isUInt<16>(Imm64))
		return false;
		shchenzUnsubmitted Not Done Reply Inline Actions `andis.` in .td handles 0xABCD0000 u32imm, will this makes this kind imm worse? shchenz: `andis.` in .td handles 0xABCD0000 u32imm, will this makes this kind imm worse?
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions No. It will be caught by single rlwinm if it is the pattern that this patch matches. steven.zhang: No. It will be caught by single rlwinm if it is the pattern that this patch matches.

		SDLoc Loc(N);
		SDValue Val = N->getOperand(0);

		// Optimized with two rldicl's as follows:
		nemanjaiUnsubmitted Not Done Reply Inline Actions I think it is confusing for a function called `RotateRightClearLeft` to emit a "Rotate Left Clear Left" nemanjai: I think it is confusing for a function called `RotateRightClearLeft` to emit a "Rotate Left…
		// Add missing bits on left to the mask and check that the mask is a
		// wrapped run of ones, i.e.
		// Change pattern \|0001111100000011111111\|
		// to \|1111111100000011111111\|.
		unsigned NumOfLeadingZeros = countLeadingZeros(Imm64);
		if (NumOfLeadingZeros != 0)
		Imm64 \|= maskLeadingOnes<uint64_t>(NumOfLeadingZeros);
		shchenzUnsubmitted Not Done Reply Inline Actions s/64/63? shchenz: s/64/63?

		unsigned MB, ME;
		if (!isRunOfOnes64(Imm64, MB, ME))
		return false;
		shchenzUnsubmitted Not Done Reply Inline Actions ME could be 63, putting 63 + 1 here seems unreasonable. shchenz: ME could be 63, putting 63 + 1 here seems unreasonable.

		// ME MB MB-ME+63
		// +----------------------+ +----------------------+
		// \|1111111100000011111111\| -> \|0000001111111111111111\|
		// +----------------------+ +----------------------+
		// 0 63 0 63
		// There are ME + 1 ones on the left and (MB - ME + 63) & 63 zeros in between.
		shchenzUnsubmitted Not Done Reply Inline Actions I personally think the logic for the special case is a little hard to follow. I guess you want to convert the following imm: 0x00ffffffffff00ff, can we first treat it as imm' 0xffffffffffff00ff? and we can call normal case like above, then we get Val = SDValue(CurDAG->getMachineNode(PPC::RLDICL, Loc, MVT::i64, Val, getI64Imm(ME + 1, Loc), getI64Imm((MB + 63 - ME) & 63, Loc)), SDValue Ops[] = {Val, getI64Imm(63 - ME, Loc), getI64Imm(0, Loc)}; CurDAG->SelectNodeTo(N, PPC::RLDICL, MVT::i64, Ops); Notice that the second `rldicl` for above normal case has no clear operation, so its mb is 0, can we add the clear for the highest 8 bits here? Make a call for above normal case and then clear the highest bits in the second `rldicl`? shchenz: I personally think the logic for the special case is a little hard to follow. I guess you want…
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions No. The pattern is: 0x00fff00ff000. The idea is to calculate the position of [MB1, ME1] for fff, and [MB2, ME2] for ff. Then, use two rotate clear instructions to do the calculation, steven.zhang: No. The pattern is: 0x00fff00ff000. The idea is to calculate the position of [MB1, ME1] for…
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions In fact, both works. The pattern is 0x00fff00fff. Update the patch to make it more clear. steven.zhang: In fact, both works. The pattern is 0x00fff00fff. Update the patch to make it more clear.
		unsigned OnesOnLeft = ME + 1;
		unsigned ZerosInBetween = (MB - ME + 63) & 63;
		// Rotate left by OnesOnLeft (so leading ones are now trailing ones) and clear
		// on the left the bits that are already zeros in the mask.
		shchenzUnsubmitted Not Done Reply Inline Actions For special case, form a new Imm, like: APInt Res(64, Imm64); ClearBits = Res.countLeadingZeros(); if (ClearBits != 0) { // change pattern \|0001111100000011111111\| // to \|1111111100000011111111\| APInt Mask = APInt::getBitsSet(64, 64 - ClearBits, 64); Res = Res \| Mask; Imm64 = Res.getZExtValue(); } And pass new Imm64 and ClearBits to normal case, I think logic here maybe a little simple? shchenz: For special case, form a new Imm, like: ``` APInt Res(64, Imm64)…
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions Good suggestion. steven.zhang: Good suggestion.
		Val = SDValue(CurDAG->getMachineNode(PPC::RLDICL, Loc, MVT::i64, Val,
		getI64Imm(OnesOnLeft, Loc),
		shchenzUnsubmitted Not Done Reply Inline Actions MB can be calculated after checking ME is valid? shchenz: MB can be calculated after checking ME is valid?
		getI64Imm(ZerosInBetween, Loc)),
		shchenzUnsubmitted Not Done Reply Inline Actions check last bit is 1? Is it too complicated to call countTrailingZeros? shchenz: check last bit is 1? Is it too complicated to call countTrailingZeros?
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions The last bit doesn't necessary to be 1. We need to get the number of trailing 0's. steven.zhang: The last bit doesn't necessary to be 1. We need to get the number of trailing 0's.
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions Good point. In fact, we don't need the check here as the isRunOfOnes64 will check it for us. steven.zhang: Good point. In fact, we don't need the check here as the isRunOfOnes64 will check it for us.
		0);
		// MB-ME+63 ME MB
		// +----------------------+ +----------------------+
		// \|0000001111111111111111\| -> \|0001111100000011111111\|
		shchenzUnsubmitted Not Done Reply Inline Actions ME is always 63? Why not use 63 directly? shchenz: ME is always 63? Why not use 63 directly?
		steven.zhangAuthorUnsubmitted Done Reply Inline Actions There won't be performance difference between the two. Use ME to make the logic clear. steven.zhang: There won't be performance difference between the two. Use ME to make the logic clear.
		// +----------------------+ +----------------------+
		// 0 63 0 63
		// Rotate back by 64 - OnesOnLeft to undo previous rotate. Then clear on the
		// left the number of ones we previously added.
		SDValue Ops[] = {Val, getI64Imm(64 - OnesOnLeft, Loc),
		getI64Imm(NumOfLeadingZeros, Loc)};
		CurDAG->SelectNodeTo(N, PPC::RLDICL, MVT::i64, Ops);
		return true;
		}

bool PPCDAGToDAGISel::tryAsSingleRLWIMI(SDNode *N) {		bool PPCDAGToDAGISel::tryAsSingleRLWIMI(SDNode *N) {
assert(N->getOpcode() == ISD::AND && "ISD::AND SDNode expected");		assert(N->getOpcode() == ISD::AND && "ISD::AND SDNode expected");
unsigned Imm;		unsigned Imm;
if (!isInt32Immediate(N->getOperand(1), Imm))		if (!isInt32Immediate(N->getOperand(1), Imm))
return false;		return false;

SDValue Val = N->getOperand(0);		SDValue Val = N->getOperand(0);
unsigned Imm2;		unsigned Imm2;
		shchenzUnsubmitted Not Done Reply Inline Actions s/32/31, s/64/63 shchenz: s/32/31, s/64/63
// ISD::OR doesn't get all the bitfield insertion fun.		// ISD::OR doesn't get all the bitfield insertion fun.
// (and (or x, c1), c2) where isRunOfOnes(~(c1^c2)) might be a		// (and (or x, c1), c2) where isRunOfOnes(~(c1^c2)) might be a
// bitfield insert.		// bitfield insert.
if (Val.getOpcode() != ISD::OR \|\| !isInt32Immediate(Val.getOperand(1), Imm2))		if (Val.getOpcode() != ISD::OR \|\| !isInt32Immediate(Val.getOperand(1), Imm2))
return false;		return false;

// The idea here is to check whether this is equivalent to:		// The idea here is to check whether this is equivalent to:
// (c1 & m) \| (x & ~m)		// (c1 & m) \| (x & ~m)
▲ Show 20 Lines • Show All 303 Lines • ▼ Show 20 Lines	if (Offset.getOpcode() == ISD::TargetConstant \|\|
ReplaceNode(N, MN);		ReplaceNode(N, MN);
return;		return;
}		}
}		}

case ISD::AND:		case ISD::AND:
// If this is an 'and' with a mask, try to emit rlwinm/rldicl/rldicr		// If this is an 'and' with a mask, try to emit rlwinm/rldicl/rldicr
if (tryAsSingleRLWINM(N) \|\| tryAsSingleRLWIMI(N) \|\| tryAsSingleRLDICL(N) \|\|		if (tryAsSingleRLWINM(N) \|\| tryAsSingleRLWIMI(N) \|\| tryAsSingleRLDICL(N) \|\|
tryAsSingleRLDICR(N) \|\| tryAsSingleRLWINM8(N))		tryAsSingleRLDICR(N) \|\| tryAsSingleRLWINM8(N) \|\| tryAsPairOfRLDICL(N))
return;		return;

// Other cases are autogenerated.		// Other cases are autogenerated.
break;		break;
case ISD::OR: {		case ISD::OR: {
if (N->getValueType(0) == MVT::i32)		if (N->getValueType(0) == MVT::i32)
if (tryBitfieldInsert(N))		if (tryBitfieldInsert(N))
return;		return;
▲ Show 20 Lines • Show All 1,905 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/2016-04-17-combine.ll

	; RUN: llc -verify-machineinstrs <%s \| FileCheck %s			; RUN: llc -verify-machineinstrs <%s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-n32:64"			target datalayout = "e-m:e-i64:64-n32:64"
	target triple = "powerpc64le-unknown-linux-gnu"			target triple = "powerpc64le-unknown-linux-gnu"

	; PR27390 crasher			; PR27390 crasher

	%typ = type { i32, i32 }			%typ = type { i32, i32 }

	; On release builds, it doesn't crash, spewing nonsense instead.			; On release builds, it doesn't crash, spewing nonsense instead.
	; To make sure it works, check that and is still alive.			; To make sure it works, check that rldicl is still alive.
	; CHECK: and			; CHECK: rldicl
	; Also, in release, it emits a COPY from a 32-bit register to			; Also, in release, it emits a COPY from a 32-bit register to
	; a 64-bit register, which happens to be emitted as cror [!]			; a 64-bit register, which happens to be emitted as cror [!]
	; by the confused CodeGen. Just to be sure, check there isn't one.			; by the confused CodeGen. Just to be sure, check there isn't one.
	; CHECK-NOT: cror			; CHECK-NOT: cror
	; Function Attrs: uwtable			; Function Attrs: uwtable
	define signext i32 @_Z8access_pP1Tc(%typ* %p, i8 zeroext %type) {			define signext i32 @_Z8access_pP1Tc(%typ* %p, i8 zeroext %type) {
	%b = getelementptr inbounds %typ, %typ* %p, i64 0, i32 1			%b = getelementptr inbounds %typ, %typ* %p, i64 0, i32 1
	%1 = load i32, i32* %b, align 4			%1 = load i32, i32* %b, align 4
	%2 = ptrtoint i32* %b to i64			%2 = ptrtoint i32* %b to i64
	%3 = and i64 %2, -35184372088833			%3 = and i64 %2, -35184372088833
	%4 = inttoptr i64 %3 to i32*			%4 = inttoptr i64 %3 to i32*
	%_msld = load i32, i32* %4, align 4			%_msld = load i32, i32* %4, align 4
	%zzz = add i32 %1, %_msld			%zzz = add i32 %1, %_msld
	ret i32 %zzz			ret i32 %zzz
	}			}

llvm/test/CodeGen/PowerPC/Frames-dyn-alloca.ll

	Show All 37 Lines
	; PPC32-LINUX-NEXT: lwz 0, -4(31)			; PPC32-LINUX-NEXT: lwz 0, -4(31)
	; PPC32-LINUX-NEXT: mr 1, 31			; PPC32-LINUX-NEXT: mr 1, 31
	; PPC32-LINUX-NEXT: mr 31, 0			; PPC32-LINUX-NEXT: mr 31, 0
	; PPC32-LINUX-NEXT: blr			; PPC32-LINUX-NEXT: blr

	; PPC64-LINUX-LABEL: f1			; PPC64-LINUX-LABEL: f1
	; PPC64-LINUX: std 31, -8(1)			; PPC64-LINUX: std 31, -8(1)
	; PPC64-LINUX-NEXT: stdu 1, -64(1)			; PPC64-LINUX-NEXT: stdu 1, -64(1)
	; PPC64-LINUX-NEXT: lis 4, 32767
	; PPC64-LINUX-NEXT: rldic 3, 3, 2, 30			; PPC64-LINUX-NEXT: rldic 3, 3, 2, 30
	; PPC64-LINUX-NEXT: ori 4, 4, 65535
	; PPC64-LINUX-NEXT: addi 3, 3, 15
	; PPC64-LINUX-NEXT: sldi 4, 4, 4
	; PPC64-LINUX-NEXT: mr 31, 1			; PPC64-LINUX-NEXT: mr 31, 1
	; PPC64-LINUX-NEXT: and 3, 3, 4			; PPC64-LINUX-NEXT: addi 3, 3, 15
	; PPC64-LINUX-NEXT: neg 3, 3			; PPC64-LINUX-NEXT: rldicl 3, 3, 60, 4
	; PPC64-LINUX-NEXT: addi 4, 31, 64			; PPC64-LINUX-NEXT: addi 4, 31, 64
				; PPC64-LINUX-NEXT: rldicl 3, 3, 4, 29
				; PPC64-LINUX-NEXT: neg 3, 3
	; PPC64-LINUX-NEXT: stdux 4, 1, 3			; PPC64-LINUX-NEXT: stdux 4, 1, 3

	; The linkage area is always put on the top of the stack.			; The linkage area is always put on the top of the stack.
	; PPC64-LINUX-NEXT: addi 3, 1, 48			; PPC64-LINUX-NEXT: addi 3, 1, 48

	; PPC64-LINUX-NEXT: ld 1, 0(1)			; PPC64-LINUX-NEXT: ld 1, 0(1)
	; PPC64-LINUX-NEXT: ld 31, -8(1)			; PPC64-LINUX-NEXT: ld 31, -8(1)
	; PPC64-LINUX-NEXT: blr			; PPC64-LINUX-NEXT: blr
	Show All 14 Lines

	; PPC32-AIX-NEXT: lwz 1, 0(1)			; PPC32-AIX-NEXT: lwz 1, 0(1)
	; PPC32-AIX-NEXT: lwz 31, -4(1)			; PPC32-AIX-NEXT: lwz 31, -4(1)
	; PPC32-AIX-NEXT: blr			; PPC32-AIX-NEXT: blr

	; PPC64-AIX-LABEL: f1			; PPC64-AIX-LABEL: f1
	; PPC64-AIX: std 31, -8(1)			; PPC64-AIX: std 31, -8(1)
	; PPC64-AIX-NEXT: stdu 1, -64(1)			; PPC64-AIX-NEXT: stdu 1, -64(1)
	; PPC64-AIX-NEXT: lis 4, 32767
	; PPC64-AIX-NEXT: rldic 3, 3, 2, 30			; PPC64-AIX-NEXT: rldic 3, 3, 2, 30
	; PPC64-AIX-NEXT: ori 4, 4, 65535
	; PPC64-AIX-NEXT: addi 3, 3, 15
	; PPC64-AIX-NEXT: sldi 4, 4, 4
	; PPC64-AIX-NEXT: mr 31, 1			; PPC64-AIX-NEXT: mr 31, 1
	; PPC64-AIX-NEXT: and 3, 3, 4			; PPC64-AIX-NEXT: addi 3, 3, 15
	; PPC64-AIX-NEXT: addi 4, 31, 64			; PPC64-AIX-NEXT: addi 4, 31, 64
				; PPC64-AIX-NEXT: rldicl 3, 3, 60, 4
				; PPC64-AIX-NEXT: rldicl 3, 3, 4, 29
	; PPC64-AIX-NEXT: neg 3, 3			; PPC64-AIX-NEXT: neg 3, 3
	; PPC64-AIX-NEXT: stdux 4, 1, 3			; PPC64-AIX-NEXT: stdux 4, 1, 3

	; The linkage area is always put on the top of the stack.			; The linkage area is always put on the top of the stack.
	; PPC64-AIX-NEXT: addi 3, 1, 48			; PPC64-AIX-NEXT: addi 3, 1, 48

	; PPC64-AIX-NEXT: ld 1, 0(1)			; PPC64-AIX-NEXT: ld 1, 0(1)
	; PPC64-AIX-NEXT: ld 31, -8(1)			; PPC64-AIX-NEXT: ld 31, -8(1)
	; PPC64-AIX-NEXT: blr			; PPC64-AIX-NEXT: blr

llvm/test/CodeGen/PowerPC/and-mask.ll

Show All 9 Lines	; CHECK-NEXT: blr
%and = and i32 %a, -2		%and = and i32 %a, -2
ret i32 %and		ret i32 %and
}		}

; mask 0xFFFFFFFFFFFFFFF9		; mask 0xFFFFFFFFFFFFFFF9
define i64 @test2(i64 %a) {		define i64 @test2(i64 %a) {
; CHECK-LABEL: test2:		; CHECK-LABEL: test2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: li 4, -7		; CHECK-NEXT: rldicl 3, 3, 61, 2
; CHECK-NEXT: and 3, 3, 4		; CHECK-NEXT: rotldi 3, 3, 3
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%and = and i64 %a, -7		%and = and i64 %a, -7
ret i64 %and		ret i64 %and
}		}

; mask: 0xFFFFFFC00000		; mask: 0xFFFFFFC00000
define i64 @test3(i64 %a) {		define i64 @test3(i64 %a) {
; CHECK-LABEL: test3:		; CHECK-LABEL: test3:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: lis 4, 1023		; CHECK-NEXT: rldicl 3, 3, 42, 22
; CHECK-NEXT: ori 4, 4, 65535		; CHECK-NEXT: rldicl 3, 3, 22, 16
; CHECK-NEXT: sldi 4, 4, 22
; CHECK-NEXT: and 3, 3, 4
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%and = and i64 %a, 281474972516352		%and = and i64 %a, 281474972516352
ret i64 %and		ret i64 %and
}		}

; mask: 0xC000000FF		; mask: 0xC000000FF
define i64 @test4(i64 %a) {		define i64 @test4(i64 %a) {
; CHECK-LABEL: test4:		; CHECK-LABEL: test4:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: li 4, 12		; CHECK-NEXT: rldicl 3, 3, 30, 26
; CHECK-NEXT: sldi 4, 4, 32		; CHECK-NEXT: rldicl 3, 3, 34, 28
; CHECK-NEXT: ori 4, 4, 255
; CHECK-NEXT: and 3, 3, 4
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%and = and i64 %a, 51539607807		%and = and i64 %a, 51539607807
ret i64 %and		ret i64 %and
}		}

; mask: 0xFFC0FFFF		; mask: 0xFFC0FFFF
define i64 @test5(i64 %a) {		define i64 @test5(i64 %a) {
; CHECK-LABEL: test5:		; CHECK-LABEL: test5:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: li 4, 0		; CHECK-NEXT: rldicl 3, 3, 42, 6
; CHECK-NEXT: oris 4, 4, 65472		; CHECK-NEXT: rldicl 3, 3, 22, 32
; CHECK-NEXT: ori 4, 4, 65535
; CHECK-NEXT: and 3, 3, 4
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%and = and i64 %a, 4290838527		%and = and i64 %a, 4290838527
ret i64 %and		ret i64 %and
}		}

; mask: 0x3FC0FFE0		; mask: 0x3FC0FFE0
define i64 @test6(i64 %a) {		define i64 @test6(i64 %a) {
; CHECK-LABEL: test6:		; CHECK-LABEL: test6:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: lis 4, 16320		; CHECK-NEXT: lis 4, 16320
; CHECK-NEXT: ori 4, 4, 65504		; CHECK-NEXT: ori 4, 4, 65504
; CHECK-NEXT: and 3, 3, 4		; CHECK-NEXT: and 3, 3, 4
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%and = and i64 %a, 1069613024		%and = and i64 %a, 1069613024
ret i64 %and		ret i64 %and
}		}

; mask: 0x3FC000001FFFF		; mask: 0x3FC000001FFFF
define i64 @test7(i64 %a) {		define i64 @test7(i64 %a) {
; CHECK-LABEL: test7:		; CHECK-LABEL: test7:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: li 4, -32767		; CHECK-NEXT: rldicl 3, 3, 22, 25
; CHECK-NEXT: sldi 4, 4, 32		; CHECK-NEXT: rldicl 3, 3, 42, 14
; CHECK-NEXT: oris 4, 4, 65024
; CHECK-NEXT: rldicr 4, 4, 17, 63
; CHECK-NEXT: and 3, 3, 4
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%and = and i64 %a, 1121501860462591		%and = and i64 %a, 1121501860462591
ret i64 %and		ret i64 %and
}		}

llvm/test/CodeGen/PowerPC/cmpb.ll

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	entry:
%conv50 = select i1 %cmp34, i32 458752, i32 0		%conv50 = select i1 %cmp34, i32 458752, i32 0
%conv53 = select i1 %cmp40, i32 -16777216, i32 0		%conv53 = select i1 %cmp40, i32 -16777216, i32 0
%or = or i32 %conv48, %conv53		%or = or i32 %conv48, %conv53
%or52 = or i32 %or, %conv47		%or52 = or i32 %or, %conv47
%or55 = or i32 %or52, %conv50		%or55 = or i32 %or52, %conv50
ret i32 %or55		ret i32 %or55

; CHECK-LABEL: @test32p1		; CHECK-LABEL: @test32p1
; CHECK: li [[REG1:[0-9]+]], 0		; CHECK: cmpb [[REG1:[0-9]+]], 4, 3
; CHECK: cmpb [[REG4:[0-9]+]], 4, 3		; CHECK: rldicl [[REG2:[0-9]+]], [[REG1]], 40, 5
; CHECK: oris [[REG2:[0-9]+]], [[REG1]], 65287		; CHECK: rldicl 3, [[REG2]], 24, 32
; CHECK: ori [[REG3:[0-9]+]], [[REG2]], 65535
; CHECK: and 3, [[REG4]], [[REG3]]
; CHECK: blr		; CHECK: blr
}		}

define zeroext i32 @test32p2(i32 zeroext %x, i32 zeroext %y) #0 {		define zeroext i32 @test32p2(i32 zeroext %x, i32 zeroext %y) #0 {
entry:		entry:
%0 = xor i32 %y, %x		%0 = xor i32 %y, %x
%1 = and i32 %0, 255		%1 = and i32 %0, 255
%cmp = icmp eq i32 %1, 0		%cmp = icmp eq i32 %1, 0
%2 = and i32 %0, 65280		%2 = and i32 %0, 65280
%cmp22 = icmp eq i32 %2, 0		%cmp22 = icmp eq i32 %2, 0
%cmp28 = icmp ult i32 %0, 16777216		%cmp28 = icmp ult i32 %0, 16777216
%conv32 = select i1 %cmp, i32 255, i32 0		%conv32 = select i1 %cmp, i32 255, i32 0
%conv33 = select i1 %cmp22, i32 65280, i32 0		%conv33 = select i1 %cmp22, i32 65280, i32 0
%conv35 = select i1 %cmp28, i32 -16777216, i32 0		%conv35 = select i1 %cmp28, i32 -16777216, i32 0
%or = or i32 %conv33, %conv35		%or = or i32 %conv33, %conv35
%or37 = or i32 %or, %conv32		%or37 = or i32 %or, %conv32
ret i32 %or37		ret i32 %or37

; CHECK-LABEL: @test32p2		; CHECK-LABEL: @test32p2
; CHECK: li [[REG1:[0-9]+]], 0		; CHECK: cmpb [[REG1:[0-9]+]], 4, 3
; CHECK: cmpb [[REG4:[0-9]+]], 4, 3		; CHECK: rldicl [[REG2:[0-9]+]], [[REG1]], 40, 8
; CHECK: oris [[REG2:[0-9]+]], [[REG1]], 65280		; CHECK: rldicl 3, [[REG2]], 24, 32
; CHECK: ori [[REG3:[0-9]+]], [[REG2]], 65535
; CHECK: and 3, [[REG4]], [[REG3]]
; CHECK: blr		; CHECK: blr
}		}

define i64 @test64(i64 %x, i64 %y) #0 {		define i64 @test64(i64 %x, i64 %y) #0 {
entry:		entry:
%shr19 = lshr i64 %x, 56		%shr19 = lshr i64 %x, 56
%conv21 = trunc i64 %shr19 to i32		%conv21 = trunc i64 %shr19 to i32
%shr43 = lshr i64 %y, 56		%shr43 = lshr i64 %y, 56
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/setcc-logic.ll

Show First 20 Lines • Show All 475 Lines • ▼ Show 20 Lines	; CHECK-NEXT: blr
%cmp2 = icmp eq <4 x i32> %c, %d		%cmp2 = icmp eq <4 x i32> %c, %d
%and = and <4 x i1> %cmp1, %cmp2		%and = and <4 x i1> %cmp1, %cmp2
ret <4 x i1> %and		ret <4 x i1> %and
}		}

define i1 @or_icmps_const_1bit_diff(i64 %x) {		define i1 @or_icmps_const_1bit_diff(i64 %x) {
; CHECK-LABEL: or_icmps_const_1bit_diff:		; CHECK-LABEL: or_icmps_const_1bit_diff:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: li 4, -5
; CHECK-NEXT: addi 3, 3, -13		; CHECK-NEXT: addi 3, 3, -13
; CHECK-NEXT: and 3, 3, 4		; CHECK-NEXT: rldicl 3, 3, 61, 1
		; CHECK-NEXT: rotldi 3, 3, 3
; CHECK-NEXT: cntlzd 3, 3		; CHECK-NEXT: cntlzd 3, 3
; CHECK-NEXT: rldicl 3, 3, 58, 63		; CHECK-NEXT: rldicl 3, 3, 58, 63
; CHECK-NEXT: blr		; CHECK-NEXT: blr
%a = icmp eq i64 %x, 17		%a = icmp eq i64 %x, 17
%b = icmp eq i64 %x, 13		%b = icmp eq i64 %x, 13
%r = or i1 %a, %b		%r = or i1 %a, %b
ret i1 %r		ret i1 %r
}		}
Show All 16 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Exploit the rldicl + rldicl when and with maskClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 258235

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp

llvm/test/CodeGen/PowerPC/2016-04-17-combine.ll

llvm/test/CodeGen/PowerPC/Frames-dyn-alloca.ll

llvm/test/CodeGen/PowerPC/and-mask.ll

llvm/test/CodeGen/PowerPC/cmpb.ll

llvm/test/CodeGen/PowerPC/setcc-logic.ll

[PowerPC] Exploit the rldicl + rldicl when and with mask
ClosedPublic