This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
5/20
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
Hexagon/
-
rotate-multi.ll
-
rotate.ll
-
X86/
3
rotate-multi.ll

Differential D47735

[DAGCombiner] Create rotates more aggressively
Needs ReviewPublic

Authored by kparzysz on Jun 4 2018, 12:04 PM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
efriedma
lebedev.ri

Summary

The DAG combiner can recognize a pattern of ORed shifts that evaluate to a bit rotation. When the rotation is ORed with another value, the OR operations can get reassociated in such a way that the rotation will no longer be identified. This patch implements a more aggressive analysis of OR operations to detect rotation patterns.

Diff Detail

Repository: rL LLVM

Event Timeline

kparzysz created this revision.Jun 4 2018, 12:04 PM

kparzysz added a child revision: D47725: [SelectionDAG] Provide default expansion for rotates.

Rebased on top of D47725.

kparzysz removed a child revision: D47725: [SelectionDAG] Provide default expansion for rotates.Jun 5 2018, 7:51 AM

kparzysz added a parent revision: D47725: [SelectionDAG] Provide default expansion for rotates.

efriedma added a subscriber: efriedma.Jun 6 2018, 3:26 PM

efriedma added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5498	Please use `while (!WorkQ.empty())` instead of this for loop.
5533–5534	This check is redundant.
5538	Could you sort the nodes so that shifts with the same LHS end up together, and use that to change this loop into something that isn't O(N^2)?
5547	Use of "break" here is weird; it breaks out of the inner loop, but not the outer loop. I'd like to see a testcase with multiple rotates or'ed together.

kparzysz marked 4 inline comments as done.Jun 7 2018, 8:29 AM

Changed pair-matching to work on smaller segments (only on shifts of the same value).

Added a testcase of rotates ORed with other rotates, or with unrelated operations.

Herald added a subscriber: mgrang. · View Herald TranscriptJun 7 2018, 8:43 AM

kparzysz retitled this revision from [SelectionDAG] Create rotates more aggressively to [DAGCombiner] Create rotates more aggressively.Jun 7 2018, 8:45 AM

kparzysz added a reviewer: RKSimon.Jun 13 2018, 11:30 AM

Ping.

Is this related to D47681 ?
This only has tests for Hexagon, can you please also add test[s] for X86, maybe AArch64?

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5497	You can do std::queue<SDValue, SmallVector<SDValue, 8>> WorkQ; to get the usual small-size-optimization benefits.
5513	I would think you'd want `DenseMap` here.

This patch and the one you mentioned coincidentally both apply to rotates, but there wasn't any coordination between them.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5497	SmallVector doesn't have pop_front, so that won't work.

Responded to comments, added x86-64 testcase.

In D47735#1135475, @kparzysz wrote:

This patch and the one you mentioned coincidentally both apply to rotates, but there wasn't any coordination between them.

What i guess i was asking, there is no overlap, they are working on a slightly different problems, although related to rotates.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5497	Right, i was thinking of `std::stack`, not `queue`, sorry.
test/CodeGen/X86/rotate-multi.ll
1	X86(and most others) tests mostly use `utils/update_llc_test_checks.py`

kparzysz added inline comments.Jun 18 2018, 11:21 AM

test/CodeGen/X86/rotate-multi.ll
1	The output looks worse: it checks almost every instruction and has specific register names (outside of argument registers).

RKSimon added inline comments.Jun 18 2018, 12:31 PM

test/CodeGen/X86/rotate-multi.ll
1	But its much less likely to allow mistakes to get through. Plus on x86 its often very useful to check all the surrounding code as that often hides issues. Once the test file is finalised best practice is to use update_llc_test_checks on it against trunk, commit that and then update the patch to show the codegen diff. Also, your RUN line should use a triple, not an arch.

Updated x86 testcase.

Ping.

Please can you add the tests to trunk with the current codegen and rebase this patch to show the codegen diff.

Rebased on top of trunk.

Assuming we're not too far off from adding IR intrinsics to represent rotate ops (D49242), would transforming to those intrinsics in IR take care of the motivating problem?

At the moment D49242 expands these intrinsics into individual DAG operations. If the intrinsics were transformed into ROTL, and if instcombine could reassociate "or" operations to expose more fshl opportunities, then I guess it would be sufficient.

A benefit of having it in the DAG combiner is that it could handle IR generators that have not generated funnel shifts.

In D47735#1163728, @kparzysz wrote:

At the moment D49242 expands these intrinsics into individual DAG operations. If the intrinsics were transformed into ROTL, and if instcombine could reassociate "or" operations to expose more fshl opportunities, then I guess it would be sufficient.

A benefit of having it in the DAG combiner is that it could handle IR generators that have not generated funnel shifts.

The next steps as I see it after D49242:

Convert the intrinsics directly to ROTL/ROTR nodes (this should be a ~2 line patch in SelectionDAGBuilder).
Expose the intrinsic as clang or other front-end builtins (the first of these will be builtin_rotate* rather than the more general builtin_funnel_shift).
Add simplifications/folds/analysis for the intrinsics to IR passes.
Canonicalize to the intrinsics in instcombine.

So if these rotate ops can be created/matched sooner (and likely more easily) in IR, then I think it's a better investment to get those intrinsics into the IR rather than trying to put the patterns back together again here in the DAG.

spatel mentioned this in D49242: [Intrinsics] define funnel shift IR intrinsics + DAG builder support.Jul 16 2018, 10:23 AM

@kparzysz Do we still need this? Does the IR funnel shift work that @spatel did last year make this redundant?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 19 2019, 10:34 AM

One goal was to be able to generate rol-and-accumulate instruction (on Hexagon), specifically for the accumulate operation being | (see f11 in rotate.ll). For the C code we still don't generate it:

unsigned blah(unsigned s, unsigned x) {
  return s | (x << 27) | (x >> 5);
}

Using clang -S -target hexagon -O2 fs.c -o - gives

{
        r0 |= asl(r1,#25)
}
{
        r0 |= lsr(r1,#7)
        jumpr r31
}

What we'd want is r0 |= rol(r1,#7).

In D47735#1404861, @kparzysz wrote:
One goal was to be able to generate rol-and-accumulate instruction (on Hexagon), specifically for the accumulate operation being | (see f11 in rotate.ll). For the C code we still don't generate it:
unsigned blah(unsigned s, unsigned x) {
  return s | (x << 27) | (x >> 5);
}

We need reassociation to happen in the IR optimizer if we want to recognize this as rotate (there could be any number of intermediate ops separating the 2 halves of the rotate):

define i32 @blah(i32 %x, i32 %s) {
  %shl = shl i32 %x, 27
  %or = or i32 %shl, %s
  %shr = lshr i32 %x, 5
  %or1 = or i32 %or, %shr <--- the 1st 'or' is between the rotated halves
  ret i32 %or1
}

D45842 was hoping to do something like that too. Note also that we don't currently canonicalize shl/shr/or with a constant shift amount to a rotate because that wasn't noted as a problem case, but this example suggests that we should do that.

Are you opposed to having this done in the DAG combiner?

In D47735#1406249, @kparzysz wrote:

Are you opposed to having this done in the DAG combiner?

Looking back at the comments from July 2018 - we have completed most of those tasks (in particular, we don't just expand the intrinsics now). So I think it would be nicer to get this part done with 'opt -reassociate', but I can't commit to doing that work myself immediately, so I can't be too opposed. :)

@efriedma - you started reviewing this, so I assume you're ok with a DAG patch. Does the introduction of the funnel shift intrinsics change your opinion of the implementation strategy?

I think the general approach is still fine. Given we have funnel shifts, we might want to reassociate to form funnel shifts, rather than just rotates, on targets which have native funnel shift instructions. (We'd still want to prefer rotates where possible, I think.)

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5518	getNodeId()?

Changed getIROrder to getNodeId.

This patch uses MatchRotate to do the actual matching, so it's only going to look for opportunities to create a rotate. Maybe that function should be replaced with MatchRotateOrFunnelShift in a future patch.

Ping.

Please can you commit the rotate-multi.ll test files with trunk's current codegen, and rebase this patch to show the codegen delta?

In D47735#1428262, @RKSimon wrote:

Please can you commit the rotate-multi.ll test files with trunk's current codegen, and rebase this patch to show the codegen delta?

And the additional rotate.ll tests as well - cheers!

Oof, I somehow missed the comments... :o

Committed the test cases (with checks against current trunk) in r356683.

In D47735#1438183, @kparzysz wrote:

Oof, I somehow missed the comments... :o

Committed the test cases (with checks against current trunk) in r356683.

.. and rebase this diff to show the changes?

I have a question: why do we want to do this here, in the backend?
Does back-end itself create these patterns?
Now that we have funnel-shift, we really really should be doing this in the middle-end.
In particular, yes, the reassoc pass may need some work. That did come up previously.

Rebased on top of the pre-committed testcases.

There can be many changes to the compiled code between the IR combiner and the DAG combiner, so these patterns can certainly appear before DAG combining takes place. Also, we already combine for rotates in the DAG, this patch only makes it more comprehensive.

Ping.

RKSimon edited reviewers, added: efriedma, lebedev.ri; removed: eli.friedman.Apr 26 2019, 1:07 AM

Ping.

After thinking about it, i think it may make sense to have this after all.
I left some comments.

Also, it would be good to have some compile-time stats comparison here.
I really think you want do add some safety limits from the getgo.
E.g. would it make sense to have more than 8 levels of these ors?
More than 256 nodes? Etc.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5499–5500	Do you want to assert that you start with `ISD::OR`?
5504	This comment should be next to the loop itself. It would be best to explain the data structures here instead.
5516–5527	Can you not do this if (Opc == ISD::SHL \|\| Opc == ISD::SRL) OpMap[V.getOperand(0)->getNodeId()].push_back(I); inside } else OredOps.push_back(V); ? One less loop.
5521	// for each shifted value, create a list of shifts.
5522	This data structure really needs a better name/description.
5538–5539	I'm not sure whether or not `auto` is good here..
5541	Do we care about the order within those two groups? Is this still good in reverse-iteration mode?
5542–5545	`llvm::partition_point()`?
5547–5549	I'm not sure what is going on here, would be good to have a high-level description comment.
5552	Early `continue`?
5569	Likewise, this really needs to have a high-level description comment.

This revision now requires changes to proceed.Jul 10 2019, 3:29 PM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

This revision now requires review to proceed.Jan 12 2023, 4:42 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 4:42 PM

Herald added subscribers: StephenFan, pengfei. · View Herald Transcript

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

95 lines

test/

CodeGen/

Hexagon/

rotate-multi.ll

20 lines

rotate.ll

6 lines

X86/

rotate-multi.ll

44 lines

Diff 191736

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <functional>		#include <functional>
#include <iterator>		#include <iterator>
		#include <queue>
#include <string>		#include <string>
#include <tuple>		#include <tuple>
#include <utility>		#include <utility>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "dagcombine"		#define DEBUG_TYPE "dagcombine"

▲ Show 20 Lines • Show All 375 Lines • ▼ Show 20 Lines	private:
SDValue buildSqrtEstimateImpl(SDValue Op, SDNodeFlags Flags, bool Recip);		SDValue buildSqrtEstimateImpl(SDValue Op, SDNodeFlags Flags, bool Recip);
SDValue buildSqrtNROneConst(SDValue Arg, SDValue Est, unsigned Iterations,		SDValue buildSqrtNROneConst(SDValue Arg, SDValue Est, unsigned Iterations,
SDNodeFlags Flags, bool Reciprocal);		SDNodeFlags Flags, bool Reciprocal);
SDValue buildSqrtNRTwoConst(SDValue Arg, SDValue Est, unsigned Iterations,		SDValue buildSqrtNRTwoConst(SDValue Arg, SDValue Est, unsigned Iterations,
SDNodeFlags Flags, bool Reciprocal);		SDNodeFlags Flags, bool Reciprocal);
SDValue MatchBSwapHWordLow(SDNode *N, SDValue N0, SDValue N1,		SDValue MatchBSwapHWordLow(SDNode *N, SDValue N0, SDValue N1,
bool DemandHighBits = true);		bool DemandHighBits = true);
SDValue MatchBSwapHWord(SDNode *N, SDValue N0, SDValue N1);		SDValue MatchBSwapHWord(SDNode *N, SDValue N0, SDValue N1);
		SDNode *ReassociateOrForRotate(SDValue Op0, SDValue Op1, const SDLoc &dl);
SDNode *MatchRotatePosNeg(SDValue Shifted, SDValue Pos, SDValue Neg,		SDNode *MatchRotatePosNeg(SDValue Shifted, SDValue Pos, SDValue Neg,
SDValue InnerPos, SDValue InnerNeg,		SDValue InnerPos, SDValue InnerNeg,
unsigned PosOpcode, unsigned NegOpcode,		unsigned PosOpcode, unsigned NegOpcode,
const SDLoc &DL);		const SDLoc &DL);
SDNode *MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL);		SDNode *MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL);
SDValue MatchLoadCombine(SDNode *N);		SDValue MatchLoadCombine(SDNode *N);
SDValue ReduceLoadWidth(SDNode *N);		SDValue ReduceLoadWidth(SDNode *N);
SDValue ReduceLoadOpStoreWidth(SDNode *N);		SDValue ReduceLoadOpStoreWidth(SDNode *N);
▲ Show 20 Lines • Show All 4,992 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitOR(SDNode *N) {
// Simplify: (or (op x...), (op y...)) -> (op (or x, y))		// Simplify: (or (op x...), (op y...)) -> (op (or x, y))
if (N0.getOpcode() == N1.getOpcode())		if (N0.getOpcode() == N1.getOpcode())
if (SDValue V = hoistLogicOpWithSameOpcodeHands(N))		if (SDValue V = hoistLogicOpWithSameOpcodeHands(N))
return V;		return V;

// See if this is some rotate idiom.		// See if this is some rotate idiom.
if (SDNode *Rot = MatchRotate(N0, N1, SDLoc(N)))		if (SDNode *Rot = MatchRotate(N0, N1, SDLoc(N)))
return SDValue(Rot, 0);		return SDValue(Rot, 0);
		// If N0 or N1 are themselves ORs, there is still a potential for a rotate
		// whose parts got reassociated with some other stuff.
		if (N0.getOpcode() == ISD::OR \|\| N1.getOpcode() == ISD::OR)
		if (SDNode *Or = ReassociateOrForRotate(N0, N1, SDLoc(N)))
		return SDValue(Or, 0);

if (SDValue Load = MatchLoadCombine(N))		if (SDValue Load = MatchLoadCombine(N))
return Load;		return Load;

// Simplify the operands using demanded-bits information.		// Simplify the operands using demanded-bits information.
if (SimplifyDemandedBits(SDValue(N, 0)))		if (SimplifyDemandedBits(SDValue(N, 0)))
return SDValue(N, 0);		return SDValue(N, 0);

return SDValue();		return SDValue();
}		}

static SDValue stripConstantMask(SelectionDAG &DAG, SDValue Op, SDValue &Mask) {		static SDValue stripConstantMask(SelectionDAG &DAG, SDValue Op, SDValue &Mask) {
if (Op.getOpcode() == ISD::AND &&		if (Op.getOpcode() == ISD::AND &&
DAG.isConstantIntBuildVectorOrConstantInt(Op.getOperand(1))) {		DAG.isConstantIntBuildVectorOrConstantInt(Op.getOperand(1))) {
Mask = Op.getOperand(1);		Mask = Op.getOperand(1);
return Op.getOperand(0);		return Op.getOperand(0);
}		}
return Op;		return Op;
}		}

		lebedev.riUnsubmitted Not Done Reply Inline Actions You can do std::queue<SDValue, SmallVector<SDValue, 8>> WorkQ; to get the usual small-size-optimization benefits. lebedev.ri: You can do ``` std::queue<SDValue, SmallVector<SDValue, 8>> WorkQ; ``` to get the usual small…
		kparzyszAuthorUnsubmitted Not Done Reply Inline Actions SmallVector doesn't have pop_front, so that won't work. kparzysz: SmallVector doesn't have pop_front, so that won't work.
		lebedev.riUnsubmitted Not Done Reply Inline Actions Right, i was thinking of `std::stack`, not `queue`, sorry. lebedev.ri: Right, i was thinking of `std::stack`, not `queue`, sorry.
		SDNode *DAGCombiner::ReassociateOrForRotate(SDValue Op0, SDValue Op1,
		efriedmaUnsubmitted Done Reply Inline Actions Please use `while (!WorkQ.empty())` instead of this for loop. efriedma: Please use `while (!WorkQ.empty())` instead of this for loop.
		const SDLoc &dl) {
		EVT VT = Op0.getValueType();
		lebedev.riUnsubmitted Not Done Reply Inline Actions Do you want to assert that you start with `ISD::OR`? lebedev.ri: Do you want to assert that you start with `ISD::OR`?
		if (!hasOperation(ISD::ROTL, VT) && !hasOperation(ISD::ROTR, VT))
		return nullptr;

		// Expand all single-use ORs into a list (OredOps).
		lebedev.riUnsubmitted Not Done Reply Inline Actions This comment should be next to the loop itself. It would be best to explain the data structures here instead. lebedev.ri: This comment should be next to the loop itself. It would be best to explain the data structures…
		SmallVector<SDValue,8> OredOps;
		std::queue<SDValue> WorkQ;
		WorkQ.push(Op0);
		WorkQ.push(Op1);

		while (!WorkQ.empty()) {
		SDValue V = WorkQ.front();
		WorkQ.pop();
		if (V.getOpcode() == ISD::OR && V.hasOneUse()) {
		lebedev.riUnsubmitted Done Reply Inline Actions I would think you'd want `DenseMap` here. lebedev.ri: I would think you'd want `DenseMap` here.
		WorkQ.push(V.getOperand(0));
		WorkQ.push(V.getOperand(1));
		} else
		OredOps.push_back(V);
		}
		efriedmaUnsubmitted Not Done Reply Inline Actions getNodeId()? efriedma: getNodeId()?

		// Since only shifts of the same SDValue can end up paired up into a rotate,
		// create separate lists of shifts for each shifted value.
		lebedev.riUnsubmitted Not Done Reply Inline Actions // for each shifted value, create a list of shifts. lebedev.ri: ``` // for each shifted value, create a list of shifts. ```
		DenseMap<int,SmallVector<unsigned,8>> OpMap;
		lebedev.riUnsubmitted Not Done Reply Inline Actions This data structure really needs a better name/description. lebedev.ri: This data structure really needs a better name/description.
		for (unsigned I = 0, E = OredOps.size(); I != E; ++I) {
		SDValue V = OredOps[I];
		unsigned Opc = V.getOpcode();
		if (Opc == ISD::SHL \|\| Opc == ISD::SRL)
		OpMap[V.getOperand(0)->getNodeId()].push_back(I);
		lebedev.riUnsubmitted Not Done Reply Inline Actions Can you not do this if (Opc == ISD::SHL \|\| Opc == ISD::SRL) OpMap[V.getOperand(0)->getNodeId()].push_back(I); inside } else OredOps.push_back(V); ? One less loop. lebedev.ri: Can you not do this ``` if (Opc == ISD::SHL \|\| Opc == ISD::SRL) OpMap[V.getOperand(0)…
		}

		// Sort the shifts with respect to the opcodes. This is to group
		// the SHL operations into one contiguous block and same for SRL.
		auto OpcOrder = [&OredOps](unsigned I, unsigned J) {
		return OredOps[I].getOpcode() < OredOps[J].getOpcode();
		};
		efriedmaUnsubmitted Done Reply Inline Actions This check is redundant. efriedma: This check is redundant.

		bool CreatedRotate = false;

		for (auto P : OpMap) {
		efriedmaUnsubmitted Done Reply Inline Actions Could you sort the nodes so that shifts with the same LHS end up together, and use that to change this loop into something that isn't O(N^2)? efriedma: Could you sort the nodes so that shifts with the same LHS end up together, and use that to…
		auto &Shifts = P.second;
		lebedev.riUnsubmitted Not Done Reply Inline Actions I'm not sure whether or not `auto` is good here.. lebedev.ri: I'm not sure whether or not `auto` is good here..
		assert(!Shifts.empty() && "OpMap should not have empty lists");
		llvm::sort(Shifts.begin(), Shifts.end(), OpcOrder);
		lebedev.riUnsubmitted Not Done Reply Inline Actions Do we care about the order within those two groups? Is this still good in reverse-iteration mode? lebedev.ri: Do we care about the order within those two groups? Is this still good in reverse-iteration…
		// The list of shifts should only have SHL and SRL on it grouped into
		// two contiguous segments. Find the beginning of the second segment.
		auto Boundary = std::upper_bound(std::next(Shifts.begin()), Shifts.end(),
		Shifts.front(), OpcOrder);
		lebedev.riUnsubmitted Not Done Reply Inline Actions `llvm::partition_point()`? lebedev.ri: `llvm::partition_point()`?

		for (unsigned I = 0, E = Boundary - Shifts.begin(); I != E; ++I) {
		efriedmaUnsubmitted Done Reply Inline Actions Use of "break" here is weird; it breaks out of the inner loop, but not the outer loop. I'd like to see a testcase with multiple rotates or'ed together. efriedma: Use of "break" here is weird; it breaks out of the inner loop, but not the outer loop. I'd…
		for (unsigned J = E, F = Shifts.size(); J != F; ++J) {
		SDValue &OI = OredOps[Shifts[I]], &OJ = OredOps[Shifts[J]];
		lebedev.riUnsubmitted Not Done Reply Inline Actions I'm not sure what is going on here, would be good to have a high-level description comment. lebedev.ri: I'm not sure what is going on here, would be good to have a high-level description comment.
		if (!OJ)
		continue;
		if (SDNode *T = MatchRotate(OI, OJ, dl)) {
		lebedev.riUnsubmitted Not Done Reply Inline Actions Early `continue`? lebedev.ri: Early `continue`?
		OredOps.push_back(SDValue(T, 0));
		OI = OJ = SDValue();
		CreatedRotate = true;
		// When a rotate is created, stop the inner loop traversal, but
		// continue with the outer loop so that more opportunities for
		// rotates of the same value could be found.
		break;
		}
		}
		}
		}

		// All pairs of left-right shifts have been examined. Now, re-package
		// the values back into an OR tree.
		if (!CreatedRotate)
		return nullptr;

		lebedev.riUnsubmitted Not Done Reply Inline Actions Likewise, this really needs to have a high-level description comment. lebedev.ri: Likewise, this really needs to have a high-level description comment.
		auto OredEnd = remove_if(OredOps, [](SDValue V) { return !bool(V); });
		unsigned Size = OredEnd - OredOps.begin();
		while (Size != 1) {
		for (unsigned i = 0; i != Size/2; ++i)
		OredOps[i] = DAG.getNode(ISD::OR, dl, VT, OredOps[2i], OredOps[2i+1]);
		if (Size % 2 != 0) {
		OredOps[Size/2] = OredOps[Size-1];
		Size = Size/2 + 1;
		} else
		Size /= 2;
		}

		// The last remaining op is the root.
		return OredOps[0].getNode();
		}

/// Match "(X shl/srl V1) & V2" where V2 may not be present.		/// Match "(X shl/srl V1) & V2" where V2 may not be present.
static bool matchRotateHalf(SelectionDAG &DAG, SDValue Op, SDValue &Shift,		static bool matchRotateHalf(SelectionDAG &DAG, SDValue Op, SDValue &Shift,
SDValue &Mask) {		SDValue &Mask) {
Op = stripConstantMask(DAG, Op, Mask);		Op = stripConstantMask(DAG, Op, Mask);
if (Op.getOpcode() == ISD::SRL \|\| Op.getOpcode() == ISD::SHL) {		if (Op.getOpcode() == ISD::SRL \|\| Op.getOpcode() == ISD::SHL) {
Shift = Op;		Shift = Op;
return true;		return true;
}		}
▲ Show 20 Lines • Show All 14,267 Lines • Show Last 20 Lines

test/CodeGen/Hexagon/rotate-multi.ll

Show All 29 Lines	b0:
%v5 = or i32 %v3, %v4		%v5 = or i32 %v3, %v4
%v6 = or i32 %v2, %v5		%v6 = or i32 %v2, %v5
%v7 = or i32 %v6, %v1		%v7 = or i32 %v6, %v1
ret i32 %v6		ret i32 %v6
}		}

; OR of two rotates of two different inputs: %a0(r0) and %a1(r1).		; OR of two rotates of two different inputs: %a0(r0) and %a1(r1).
; CHECK-LABEL: f2:		; CHECK-LABEL: f2:
; CHECK: r[[R20:[0-9]+]] = asl(r0,#11)		; CHECK: r0 = rol(r0,#11)
; CHECK: r[[R21:[0-9]+]] = lsr(r0,#21)		; CHECK: r0 \|= rol(r1,#19)
; CHECK: r[[R22:[0-9]+]] = lsr(r1,#13)
; CHECK: r[[R20]] \|= asl(r1,#19)
; CHECK: r[[R20]] \|= or(r[[R21]],r[[R22]])
define i32 @f2(i32 %a0, i32 %a1) #0 {		define i32 @f2(i32 %a0, i32 %a1) #0 {
%v0 = shl i32 %a0, 11		%v0 = shl i32 %a0, 11
%v1 = lshr i32 %a0, 21		%v1 = lshr i32 %a0, 21
%v2 = shl i32 %a1, 19		%v2 = shl i32 %a1, 19
%v3 = lshr i32 %a1, 13		%v3 = lshr i32 %a1, 13
%v4 = or i32 %v0, %v2		%v4 = or i32 %v0, %v2
%v5 = or i32 %v1, %v3		%v5 = or i32 %v1, %v3
%v6 = or i32 %v4, %v5		%v6 = or i32 %v4, %v5
ret i32 %v6		ret i32 %v6
}		}

; ORs of multiple shifts of the same value with only one pair actually		; ORs of multiple shifts of the same value with only one pair actually
; matching a rotate.		; matching a rotate.
; CHECK-LABEL: f3:		; CHECK-LABEL: f3:
; CHECK: r[[R30:[0-9]+]] = asl(r0,#3)		; CHECK-NOT: rol
; CHECK: r[[R30]] \|= asl(r0,#5)		; CHECK: r[[R30:[0-9]+]] \|= rol(r0,#7)
; CHECK: r[[R30]] \|= asl(r0,#7)		; CHECK-NOT: rol
; CHECK: r[[R30]] \|= asl(r0,#13)
; CHECK: r[[R30]] \|= asl(r0,#19)
; CHECK: r[[R30]] \|= lsr(r0,#2)
; CHECK: r[[R30]] \|= lsr(r0,#15)
; CHECK: r[[R30]] \|= lsr(r0,#23)
; CHECK: r[[R30]] \|= lsr(r0,#25)
; CHECK: r[[R30]] \|= lsr(r0,#30)
define i32 @f3(i32 %a0) #0 {		define i32 @f3(i32 %a0) #0 {
%v0 = shl i32 %a0, 3		%v0 = shl i32 %a0, 3
%v1 = shl i32 %a0, 5		%v1 = shl i32 %a0, 5
%v2 = shl i32 %a0, 7 ; rotate		%v2 = shl i32 %a0, 7 ; rotate
%v3 = shl i32 %a0, 13		%v3 = shl i32 %a0, 13
%v4 = shl i32 %a0, 19		%v4 = shl i32 %a0, 19
%v5 = lshr i32 %a0, 2		%v5 = lshr i32 %a0, 2
%v6 = lshr i32 %a0, 15		%v6 = lshr i32 %a0, 15
Show All 16 Lines

test/CodeGen/Hexagon/rotate.ll

Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	b0:
%v0 = shl i32 %a1, 7		%v0 = shl i32 %a1, 7
%v1 = lshr i32 %a1, 25		%v1 = lshr i32 %a1, 25
%v2 = or i32 %v0, %v1		%v2 = or i32 %v0, %v1
%v3 = and i32 %v2, %a0		%v3 = and i32 %v2, %a0
ret i32 %v3		ret i32 %v3
}		}

; CHECK-LABEL: f11		; CHECK-LABEL: f11
; CHECK: r0 \|= lsr(r1,#25)		; CHECK: r0 \|= rol(r1,#7)
; CHECK: r0 \|= asl(r1,#7)
define i32 @f11(i32 %a0, i32 %a1) #0 {		define i32 @f11(i32 %a0, i32 %a1) #0 {
b0:		b0:
%v0 = shl i32 %a1, 7		%v0 = shl i32 %a1, 7
%v1 = lshr i32 %a1, 25		%v1 = lshr i32 %a1, 25
%v2 = or i32 %v1, %a0		%v2 = or i32 %v1, %a0
%v3 = or i32 %v2, %v0		%v3 = or i32 %v2, %v0
ret i32 %v3		ret i32 %v3
}		}
Show All 38 Lines	b0:
%v0 = shl i64 %a1, 7		%v0 = shl i64 %a1, 7
%v1 = lshr i64 %a1, 57		%v1 = lshr i64 %a1, 57
%v2 = or i64 %v0, %v1		%v2 = or i64 %v0, %v1
%v3 = and i64 %v2, %a0		%v3 = and i64 %v2, %a0
ret i64 %v3		ret i64 %v3
}		}

; CHECK-LABEL: f16		; CHECK-LABEL: f16
; CHECK: r1:0 \|= lsr(r3:2,#57)		; CHECK: r1:0 \|= rol(r3:2,#7)
; CHECK: r1:0 \|= asl(r3:2,#7)
define i64 @f16(i64 %a0, i64 %a1) #0 {		define i64 @f16(i64 %a0, i64 %a1) #0 {
b0:		b0:
%v0 = shl i64 %a1, 7		%v0 = shl i64 %a1, 7
%v1 = lshr i64 %a1, 57		%v1 = lshr i64 %a1, 57
%v2 = or i64 %v1, %a0		%v2 = or i64 %v1, %a0
%v3 = or i64 %v2, %v0		%v3 = or i64 %v2, %v0
ret i64 %v3		ret i64 %v3
}		}
Show All 13 Lines

test/CodeGen/X86/rotate-multi.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
		lebedev.riUnsubmitted Not Done Reply Inline Actions X86(and most others) tests mostly use `utils/update_llc_test_checks.py` lebedev.ri: X86(and most others) tests mostly use `utils/update_llc_test_checks.py`
		kparzyszAuthorUnsubmitted Not Done Reply Inline Actions The output looks worse: it checks almost every instruction and has specific register names (outside of argument registers). kparzysz: The output looks worse: it checks almost every instruction and has specific register names…
		RKSimonUnsubmitted Not Done Reply Inline Actions But its much less likely to allow mistakes to get through. Plus on x86 its often very useful to check all the surrounding code as that often hides issues. Once the test file is finalised best practice is to use update_llc_test_checks on it against trunk, commit that and then update the patch to show the codegen diff. Also, your RUN line should use a triple, not an arch. RKSimon: But its much less likely to allow mistakes to get through. Plus on x86 its often very useful to…
; RUN: llc -mtriple=x86_64-- < %s \| FileCheck %s		; RUN: llc -mtriple=x86_64-- < %s \| FileCheck %s

; OR of two rotates of %a0(edi).		; OR of two rotates of %a0(edi).
define i32 @f0(i32 %a0) #0 {		define i32 @f0(i32 %a0) #0 {
; CHECK-LABEL: f0:		; CHECK-LABEL: f0:
; CHECK: # %bb.0: # %b0		; CHECK: # %bb.0: # %b0
; CHECK-NEXT: movl %edi, %eax		; CHECK-NEXT: movl %edi, %eax
; CHECK-NEXT: movl %edi, %ecx		; CHECK-NEXT: movl %edi, %ecx
; CHECK-NEXT: roll $7, %ecx		; CHECK-NEXT: roll $9, %ecx
; CHECK-NEXT: roll $9, %eax		; CHECK-NEXT: roll $7, %eax
; CHECK-NEXT: orl %ecx, %eax		; CHECK-NEXT: orl %ecx, %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
b0:		b0:
%v0 = shl i32 %a0, 7		%v0 = shl i32 %a0, 7
%v1 = lshr i32 %a0, 25		%v1 = lshr i32 %a0, 25
%v2 = or i32 %v0, %v1		%v2 = or i32 %v0, %v1
%v3 = shl i32 %a0, 9		%v3 = shl i32 %a0, 9
%v4 = lshr i32 %a0, 23		%v4 = lshr i32 %a0, 23
Show All 24 Lines	b0:
%v7 = or i32 %v6, %v1		%v7 = or i32 %v6, %v1
ret i32 %v6		ret i32 %v6
}		}

; OR of two rotates of two different inputs: %a0(edi) and %a1(esi).		; OR of two rotates of two different inputs: %a0(edi) and %a1(esi).
define i32 @f2(i32 %a0, i32 %a1) #0 {		define i32 @f2(i32 %a0, i32 %a1) #0 {
; CHECK-LABEL: f2:		; CHECK-LABEL: f2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: movl %esi, %eax		; CHECK-NEXT: movl %edi, %eax
; CHECK-NEXT: movl %edi, %ecx		; CHECK-NEXT: roll $19, %esi
; CHECK-NEXT: shll $11, %ecx		; CHECK-NEXT: roll $11, %eax
; CHECK-NEXT: shrl $21, %edi		; CHECK-NEXT: orl %esi, %eax
; CHECK-NEXT: movl %esi, %edx
; CHECK-NEXT: shll $19, %edx
; CHECK-NEXT: shrl $13, %eax
; CHECK-NEXT: orl %edi, %eax
; CHECK-NEXT: orl %edx, %eax
; CHECK-NEXT: orl %ecx, %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%v0 = shl i32 %a0, 11		%v0 = shl i32 %a0, 11
%v1 = lshr i32 %a0, 21		%v1 = lshr i32 %a0, 21
%v2 = shl i32 %a1, 19		%v2 = shl i32 %a1, 19
%v3 = lshr i32 %a1, 13		%v3 = lshr i32 %a1, 13
%v4 = or i32 %v0, %v2		%v4 = or i32 %v0, %v2
%v5 = or i32 %v1, %v3		%v5 = or i32 %v1, %v3
%v6 = or i32 %v4, %v5		%v6 = or i32 %v4, %v5
ret i32 %v6		ret i32 %v6
}		}

; ORs of multiple shifts of the same value with only one pair actually		; ORs of multiple shifts of the same value with only one pair actually
; matching a rotate.		; matching a rotate.
define i32 @f3(i32 %a0) #0 {		define i32 @f3(i32 %a0) #0 {
; CHECK-LABEL: f3:		; CHECK-LABEL: f3:
; CHECK: # %bb.0: # %b0		; CHECK: # %bb.0: # %b0
; CHECK-NEXT: # kill: def $edi killed $edi def $rdi		; CHECK-NEXT: movl %edi, %eax
; CHECK-NEXT: leal (,%rdi,8), %eax		; CHECK-NEXT: leal (,%rax,8), %ecx
; CHECK-NEXT: movl %edi, %ecx
; CHECK-NEXT: shll $5, %ecx
; CHECK-NEXT: movl %edi, %edx		; CHECK-NEXT: movl %edi, %edx
; CHECK-NEXT: shll $7, %edx		; CHECK-NEXT: shll $5, %edx
; CHECK-NEXT: orl %ecx, %edx		; CHECK-NEXT: orl %ecx, %edx
; CHECK-NEXT: movl %edi, %ecx		; CHECK-NEXT: movl %edi, %ecx
; CHECK-NEXT: shll $13, %ecx		; CHECK-NEXT: shll $13, %ecx
; CHECK-NEXT: orl %edx, %ecx		; CHECK-NEXT: orl %edx, %ecx
; CHECK-NEXT: movl %edi, %edx		; CHECK-NEXT: movl %edi, %edx
; CHECK-NEXT: shll $19, %edx		; CHECK-NEXT: shll $19, %edx
; CHECK-NEXT: orl %ecx, %edx		; CHECK-NEXT: orl %ecx, %edx
; CHECK-NEXT: movl %edi, %ecx		; CHECK-NEXT: movl %edi, %ecx
; CHECK-NEXT: shrl $2, %ecx		; CHECK-NEXT: shrl $2, %ecx
; CHECK-NEXT: orl %edx, %ecx		; CHECK-NEXT: movl %edi, %esi
; CHECK-NEXT: movl %edi, %edx		; CHECK-NEXT: shrl $15, %esi
; CHECK-NEXT: shrl $15, %edx		; CHECK-NEXT: orl %ecx, %esi
; CHECK-NEXT: orl %ecx, %edx
; CHECK-NEXT: movl %edi, %ecx		; CHECK-NEXT: movl %edi, %ecx
; CHECK-NEXT: shrl $23, %ecx		; CHECK-NEXT: shrl $23, %ecx
		; CHECK-NEXT: orl %esi, %ecx
; CHECK-NEXT: orl %edx, %ecx		; CHECK-NEXT: orl %edx, %ecx
; CHECK-NEXT: movl %edi, %edx		; CHECK-NEXT: movl %edi, %edx
; CHECK-NEXT: shrl $25, %edx		; CHECK-NEXT: shrl $30, %edx
; CHECK-NEXT: orl %ecx, %edx		; CHECK-NEXT: roll $7, %eax
; CHECK-NEXT: shrl $30, %edi		; CHECK-NEXT: orl %ecx, %eax
; CHECK-NEXT: orl %edx, %edi		; CHECK-NEXT: orl %edx, %eax
; CHECK-NEXT: orl %edi, %eax		; CHECK-NEXT: # kill: def $eax killed $eax killed $rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
b0:		b0:
%v0 = shl i32 %a0, 3		%v0 = shl i32 %a0, 3
%v1 = shl i32 %a0, 5		%v1 = shl i32 %a0, 5
%v2 = shl i32 %a0, 7 ; rotate		%v2 = shl i32 %a0, 7 ; rotate
%v3 = shl i32 %a0, 13		%v3 = shl i32 %a0, 13
%v4 = shl i32 %a0, 19		%v4 = shl i32 %a0, 19
%v5 = lshr i32 %a0, 2		%v5 = lshr i32 %a0, 2
Show All 17 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Create rotates more aggressivelyNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 191736

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/Hexagon/rotate-multi.ll

test/CodeGen/Hexagon/rotate.ll

test/CodeGen/X86/rotate-multi.ll

[DAGCombiner] Create rotates more aggressively
Needs ReviewPublic