This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
5/20
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
Hexagon/
-
rotate-multi.ll
-
rotate.ll
-
X86/
3
rotate-multi.ll

Differential D47735

[DAGCombiner] Create rotates more aggressively
Needs ReviewPublic

Authored by kparzysz on Jun 4 2018, 12:04 PM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
efriedma
lebedev.ri

Summary

The DAG combiner can recognize a pattern of ORed shifts that evaluate to a bit rotation. When the rotation is ORed with another value, the OR operations can get reassociated in such a way that the rotation will no longer be identified. This patch implements a more aggressive analysis of OR operations to detect rotation patterns.

Diff Detail

Repository: rL LLVM

Event Timeline

kparzysz created this revision.Jun 4 2018, 12:04 PM

kparzysz added a child revision: D47725: [SelectionDAG] Provide default expansion for rotates.

Rebased on top of D47725.

kparzysz removed a child revision: D47725: [SelectionDAG] Provide default expansion for rotates.Jun 5 2018, 7:51 AM

kparzysz added a parent revision: D47725: [SelectionDAG] Provide default expansion for rotates.

efriedma added a subscriber: efriedma.Jun 6 2018, 3:26 PM

efriedma added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4920	Please use `while (!WorkQ.empty())` instead of this for loop.
4955–4956	This check is redundant.
4960	Could you sort the nodes so that shifts with the same LHS end up together, and use that to change this loop into something that isn't O(N^2)?
4969	Use of "break" here is weird; it breaks out of the inner loop, but not the outer loop. I'd like to see a testcase with multiple rotates or'ed together.

kparzysz marked 4 inline comments as done.Jun 7 2018, 8:29 AM

Changed pair-matching to work on smaller segments (only on shifts of the same value).

Added a testcase of rotates ORed with other rotates, or with unrelated operations.

Herald added a subscriber: mgrang. · View Herald TranscriptJun 7 2018, 8:43 AM

kparzysz retitled this revision from [SelectionDAG] Create rotates more aggressively to [DAGCombiner] Create rotates more aggressively.Jun 7 2018, 8:45 AM

kparzysz added a reviewer: RKSimon.Jun 13 2018, 11:30 AM

Ping.

Is this related to D47681 ?
This only has tests for Hexagon, can you please also add test[s] for X86, maybe AArch64?

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4919	You can do std::queue<SDValue, SmallVector<SDValue, 8>> WorkQ; to get the usual small-size-optimization benefits.
4935	I would think you'd want `DenseMap` here.

This patch and the one you mentioned coincidentally both apply to rotates, but there wasn't any coordination between them.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4919	SmallVector doesn't have pop_front, so that won't work.

Responded to comments, added x86-64 testcase.

In D47735#1135475, @kparzysz wrote:

This patch and the one you mentioned coincidentally both apply to rotates, but there wasn't any coordination between them.

What i guess i was asking, there is no overlap, they are working on a slightly different problems, although related to rotates.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4919	Right, i was thinking of `std::stack`, not `queue`, sorry.
test/CodeGen/X86/rotate-multi.ll
2	X86(and most others) tests mostly use `utils/update_llc_test_checks.py`

kparzysz added inline comments.Jun 18 2018, 11:21 AM

test/CodeGen/X86/rotate-multi.ll
2	The output looks worse: it checks almost every instruction and has specific register names (outside of argument registers).

RKSimon added inline comments.Jun 18 2018, 12:31 PM

test/CodeGen/X86/rotate-multi.ll
2	But its much less likely to allow mistakes to get through. Plus on x86 its often very useful to check all the surrounding code as that often hides issues. Once the test file is finalised best practice is to use update_llc_test_checks on it against trunk, commit that and then update the patch to show the codegen diff. Also, your RUN line should use a triple, not an arch.

Updated x86 testcase.

Ping.

Please can you add the tests to trunk with the current codegen and rebase this patch to show the codegen diff.

Rebased on top of trunk.

Assuming we're not too far off from adding IR intrinsics to represent rotate ops (D49242), would transforming to those intrinsics in IR take care of the motivating problem?

At the moment D49242 expands these intrinsics into individual DAG operations. If the intrinsics were transformed into ROTL, and if instcombine could reassociate "or" operations to expose more fshl opportunities, then I guess it would be sufficient.

A benefit of having it in the DAG combiner is that it could handle IR generators that have not generated funnel shifts.

In D47735#1163728, @kparzysz wrote:

At the moment D49242 expands these intrinsics into individual DAG operations. If the intrinsics were transformed into ROTL, and if instcombine could reassociate "or" operations to expose more fshl opportunities, then I guess it would be sufficient.

A benefit of having it in the DAG combiner is that it could handle IR generators that have not generated funnel shifts.

The next steps as I see it after D49242:

Convert the intrinsics directly to ROTL/ROTR nodes (this should be a ~2 line patch in SelectionDAGBuilder).
Expose the intrinsic as clang or other front-end builtins (the first of these will be builtin_rotate* rather than the more general builtin_funnel_shift).
Add simplifications/folds/analysis for the intrinsics to IR passes.
Canonicalize to the intrinsics in instcombine.

So if these rotate ops can be created/matched sooner (and likely more easily) in IR, then I think it's a better investment to get those intrinsics into the IR rather than trying to put the patterns back together again here in the DAG.

spatel mentioned this in D49242: [Intrinsics] define funnel shift IR intrinsics + DAG builder support.Jul 16 2018, 10:23 AM

@kparzysz Do we still need this? Does the IR funnel shift work that @spatel did last year make this redundant?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 19 2019, 10:34 AM

One goal was to be able to generate rol-and-accumulate instruction (on Hexagon), specifically for the accumulate operation being | (see f11 in rotate.ll). For the C code we still don't generate it:

unsigned blah(unsigned s, unsigned x) {
  return s | (x << 27) | (x >> 5);
}

Using clang -S -target hexagon -O2 fs.c -o - gives

{
        r0 |= asl(r1,#25)
}
{
        r0 |= lsr(r1,#7)
        jumpr r31
}

What we'd want is r0 |= rol(r1,#7).

In D47735#1404861, @kparzysz wrote:
One goal was to be able to generate rol-and-accumulate instruction (on Hexagon), specifically for the accumulate operation being | (see f11 in rotate.ll). For the C code we still don't generate it:
unsigned blah(unsigned s, unsigned x) {
  return s | (x << 27) | (x >> 5);
}

We need reassociation to happen in the IR optimizer if we want to recognize this as rotate (there could be any number of intermediate ops separating the 2 halves of the rotate):

define i32 @blah(i32 %x, i32 %s) {
  %shl = shl i32 %x, 27
  %or = or i32 %shl, %s
  %shr = lshr i32 %x, 5
  %or1 = or i32 %or, %shr <--- the 1st 'or' is between the rotated halves
  ret i32 %or1
}

D45842 was hoping to do something like that too. Note also that we don't currently canonicalize shl/shr/or with a constant shift amount to a rotate because that wasn't noted as a problem case, but this example suggests that we should do that.

Are you opposed to having this done in the DAG combiner?

In D47735#1406249, @kparzysz wrote:

Are you opposed to having this done in the DAG combiner?

Looking back at the comments from July 2018 - we have completed most of those tasks (in particular, we don't just expand the intrinsics now). So I think it would be nicer to get this part done with 'opt -reassociate', but I can't commit to doing that work myself immediately, so I can't be too opposed. :)

@efriedma - you started reviewing this, so I assume you're ok with a DAG patch. Does the introduction of the funnel shift intrinsics change your opinion of the implementation strategy?

I think the general approach is still fine. Given we have funnel shifts, we might want to reassociate to form funnel shifts, rather than just rotates, on targets which have native funnel shift instructions. (We'd still want to prefer rotates where possible, I think.)

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4940	getNodeId()?

Changed getIROrder to getNodeId.

This patch uses MatchRotate to do the actual matching, so it's only going to look for opportunities to create a rotate. Maybe that function should be replaced with MatchRotateOrFunnelShift in a future patch.

Ping.

Please can you commit the rotate-multi.ll test files with trunk's current codegen, and rebase this patch to show the codegen delta?

In D47735#1428262, @RKSimon wrote:

Please can you commit the rotate-multi.ll test files with trunk's current codegen, and rebase this patch to show the codegen delta?

And the additional rotate.ll tests as well - cheers!

Oof, I somehow missed the comments... :o

Committed the test cases (with checks against current trunk) in r356683.

In D47735#1438183, @kparzysz wrote:

Oof, I somehow missed the comments... :o

Committed the test cases (with checks against current trunk) in r356683.

.. and rebase this diff to show the changes?

I have a question: why do we want to do this here, in the backend?
Does back-end itself create these patterns?
Now that we have funnel-shift, we really really should be doing this in the middle-end.
In particular, yes, the reassoc pass may need some work. That did come up previously.

Rebased on top of the pre-committed testcases.

There can be many changes to the compiled code between the IR combiner and the DAG combiner, so these patterns can certainly appear before DAG combining takes place. Also, we already combine for rotates in the DAG, this patch only makes it more comprehensive.

Ping.

RKSimon edited reviewers, added: efriedma, lebedev.ri; removed: eli.friedman.Apr 26 2019, 1:07 AM

Ping.

After thinking about it, i think it may make sense to have this after all.
I left some comments.

Also, it would be good to have some compile-time stats comparison here.
I really think you want do add some safety limits from the getgo.
E.g. would it make sense to have more than 8 levels of these ors?
More than 256 nodes? Etc.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5011–5012	Do you want to assert that you start with `ISD::OR`?
5016	This comment should be next to the loop itself. It would be best to explain the data structures here instead.
5028–5039	Can you not do this if (Opc == ISD::SHL \|\| Opc == ISD::SRL) OpMap[V.getOperand(0)->getNodeId()].push_back(I); inside } else OredOps.push_back(V); ? One less loop.
5033	// for each shifted value, create a list of shifts.
5034	This data structure really needs a better name/description.
5050–5051	I'm not sure whether or not `auto` is good here..
5053	Do we care about the order within those two groups? Is this still good in reverse-iteration mode?
5054–5057	`llvm::partition_point()`?
5059–5061	I'm not sure what is going on here, would be good to have a high-level description comment.
5064	Early `continue`?
5081	Likewise, this really needs to have a high-level description comment.

This revision now requires changes to proceed.Jul 10 2019, 3:29 PM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

This revision now requires review to proceed.Jan 12 2023, 4:42 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 4:42 PM

Herald added subscribers: StephenFan, pengfei. · View Herald Transcript

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

95 lines

test/

CodeGen/

Hexagon/

rotate-multi.ll

80 lines

rotate.ll

22 lines

X86/

rotate-multi.ll

121 lines

Diff 151777

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <functional>		#include <functional>
#include <iterator>		#include <iterator>
		#include <queue>
#include <string>		#include <string>
#include <tuple>		#include <tuple>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "dagcombine"		#define DEBUG_TYPE "dagcombine"
▲ Show 20 Lines • Show All 355 Lines • ▼ Show 20 Lines	private:
SDValue buildSqrtEstimateImpl(SDValue Op, SDNodeFlags Flags, bool Recip);		SDValue buildSqrtEstimateImpl(SDValue Op, SDNodeFlags Flags, bool Recip);
SDValue buildSqrtNROneConst(SDValue Op, SDValue Est, unsigned Iterations,		SDValue buildSqrtNROneConst(SDValue Op, SDValue Est, unsigned Iterations,
SDNodeFlags Flags, bool Reciprocal);		SDNodeFlags Flags, bool Reciprocal);
SDValue buildSqrtNRTwoConst(SDValue Op, SDValue Est, unsigned Iterations,		SDValue buildSqrtNRTwoConst(SDValue Op, SDValue Est, unsigned Iterations,
SDNodeFlags Flags, bool Reciprocal);		SDNodeFlags Flags, bool Reciprocal);
SDValue MatchBSwapHWordLow(SDNode *N, SDValue N0, SDValue N1,		SDValue MatchBSwapHWordLow(SDNode *N, SDValue N0, SDValue N1,
bool DemandHighBits = true);		bool DemandHighBits = true);
SDValue MatchBSwapHWord(SDNode *N, SDValue N0, SDValue N1);		SDValue MatchBSwapHWord(SDNode *N, SDValue N0, SDValue N1);
		SDNode *ReassociateOrForRotate(SDValue Op0, SDValue Op1, const SDLoc &dl);
SDNode *MatchRotatePosNeg(SDValue Shifted, SDValue Pos, SDValue Neg,		SDNode *MatchRotatePosNeg(SDValue Shifted, SDValue Pos, SDValue Neg,
SDValue InnerPos, SDValue InnerNeg,		SDValue InnerPos, SDValue InnerNeg,
unsigned PosOpcode, unsigned NegOpcode,		unsigned PosOpcode, unsigned NegOpcode,
const SDLoc &DL);		const SDLoc &DL);
SDNode *MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL);		SDNode *MatchRotate(SDValue LHS, SDValue RHS, const SDLoc &DL);
SDValue MatchLoadCombine(SDNode *N);		SDValue MatchLoadCombine(SDNode *N);
SDValue ReduceLoadWidth(SDNode *N);		SDValue ReduceLoadWidth(SDNode *N);
SDValue ReduceLoadOpStoreWidth(SDNode *N);		SDValue ReduceLoadOpStoreWidth(SDNode *N);
▲ Show 20 Lines • Show All 4,434 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitOR(SDNode *N) {
// Simplify: (or (op x...), (op y...)) -> (op (or x, y))		// Simplify: (or (op x...), (op y...)) -> (op (or x, y))
if (N0.getOpcode() == N1.getOpcode())		if (N0.getOpcode() == N1.getOpcode())
if (SDValue Tmp = SimplifyBinOpWithSameOpcodeHands(N))		if (SDValue Tmp = SimplifyBinOpWithSameOpcodeHands(N))
return Tmp;		return Tmp;

// See if this is some rotate idiom.		// See if this is some rotate idiom.
if (SDNode *Rot = MatchRotate(N0, N1, SDLoc(N)))		if (SDNode *Rot = MatchRotate(N0, N1, SDLoc(N)))
return SDValue(Rot, 0);		return SDValue(Rot, 0);
		// If N0 or N1 are themselves ORs, there is still a potential for a rotate
		// whose parts got reassociated with some other stuff.
		if (N0.getOpcode() == ISD::OR \|\| N1.getOpcode() == ISD::OR)
		if (SDNode *Or = ReassociateOrForRotate(N0, N1, SDLoc(N)))
		return SDValue(Or, 0);

if (SDValue Load = MatchLoadCombine(N))		if (SDValue Load = MatchLoadCombine(N))
return Load;		return Load;

// Simplify the operands using demanded-bits information.		// Simplify the operands using demanded-bits information.
if (SimplifyDemandedBits(SDValue(N, 0)))		if (SimplifyDemandedBits(SDValue(N, 0)))
return SDValue(N, 0);		return SDValue(N, 0);

return SDValue();		return SDValue();
}		}

		SDNode *DAGCombiner::ReassociateOrForRotate(SDValue Op0, SDValue Op1,
		const SDLoc &dl) {
		EVT VT = Op0.getValueType();
		if (!hasOperation(ISD::ROTL, VT) && !hasOperation(ISD::ROTR, VT))
		return nullptr;

		// Expand all single-use ORs into a list (OredOps).
		SmallVector<SDValue,8> OredOps;
		std::queue<SDValue> WorkQ;
		lebedev.riUnsubmitted Not Done Reply Inline Actions You can do std::queue<SDValue, SmallVector<SDValue, 8>> WorkQ; to get the usual small-size-optimization benefits. lebedev.ri: You can do ``` std::queue<SDValue, SmallVector<SDValue, 8>> WorkQ; ``` to get the usual small…
		kparzyszAuthorUnsubmitted Not Done Reply Inline Actions SmallVector doesn't have pop_front, so that won't work. kparzysz: SmallVector doesn't have pop_front, so that won't work.
		lebedev.riUnsubmitted Not Done Reply Inline Actions Right, i was thinking of `std::stack`, not `queue`, sorry. lebedev.ri: Right, i was thinking of `std::stack`, not `queue`, sorry.
		WorkQ.push(Op0);
		efriedmaUnsubmitted Done Reply Inline Actions Please use `while (!WorkQ.empty())` instead of this for loop. efriedma: Please use `while (!WorkQ.empty())` instead of this for loop.
		WorkQ.push(Op1);

		while (!WorkQ.empty()) {
		SDValue V = WorkQ.front();
		WorkQ.pop();
		if (V.getOpcode() == ISD::OR && V.hasOneUse()) {
		WorkQ.push(V.getOperand(0));
		WorkQ.push(V.getOperand(1));
		} else
		OredOps.push_back(V);
		}

		// Since only shifts of the same SDValue can end up paired up into a rotate,
		// create separate lists of shifts for each shifted value.
		DenseMap<int,SmallVector<unsigned,8>> OpMap;
		lebedev.riUnsubmitted Done Reply Inline Actions I would think you'd want `DenseMap` here. lebedev.ri: I would think you'd want `DenseMap` here.
		for (unsigned I = 0, E = OredOps.size(); I != E; ++I) {
		SDValue V = OredOps[I];
		unsigned Opc = V.getOpcode();
		if (Opc == ISD::SHL \|\| Opc == ISD::SRL)
		OpMap[V.getOperand(0)->getIROrder()].push_back(I);
		efriedmaUnsubmitted Not Done Reply Inline Actions getNodeId()? efriedma: getNodeId()?
		}

		// Sort the shifts with respect to the opcodes. This is to group
		// the SHL operations into one contiguous block and same for SRL.
		auto OpcOrder = [&OredOps](unsigned I, unsigned J) {
		return OredOps[I].getOpcode() < OredOps[J].getOpcode();
		};

		bool CreatedRotate = false;

		for (auto P : OpMap) {
		auto &Shifts = P.second;
		assert(!Shifts.empty() && "OpMap should not have empty lists");
		llvm::sort(Shifts.begin(), Shifts.end(), OpcOrder);
		// The list of shifts should only have SHL and SRL on it grouped into
		// two contiguous segments. Find the beginning of the second segment.
		efriedmaUnsubmitted Done Reply Inline Actions This check is redundant. efriedma: This check is redundant.
		auto Boundary = std::upper_bound(std::next(Shifts.begin()), Shifts.end(),
		Shifts.front(), OpcOrder);

		for (unsigned I = 0, E = Boundary - Shifts.begin(); I != E; ++I) {
		efriedmaUnsubmitted Done Reply Inline Actions Could you sort the nodes so that shifts with the same LHS end up together, and use that to change this loop into something that isn't O(N^2)? efriedma: Could you sort the nodes so that shifts with the same LHS end up together, and use that to…
		for (unsigned J = E, F = Shifts.size(); J != F; ++J) {
		SDValue &OI = OredOps[Shifts[I]], &OJ = OredOps[Shifts[J]];
		if (!OJ)
		continue;
		if (SDNode *T = MatchRotate(OI, OJ, dl)) {
		OredOps.push_back(SDValue(T, 0));
		OI = OJ = SDValue();
		CreatedRotate = true;
		// When a rotate is created, stop the inner loop traversal, but
		efriedmaUnsubmitted Done Reply Inline Actions Use of "break" here is weird; it breaks out of the inner loop, but not the outer loop. I'd like to see a testcase with multiple rotates or'ed together. efriedma: Use of "break" here is weird; it breaks out of the inner loop, but not the outer loop. I'd…
		// continue with the outer loop so that more opportunities for
		// rotates of the same value could be found.
		break;
		}
		}
		}
		}

		// All pairs of left-right shifts have been examined. Now, re-package
		// the values back into an OR tree.
		if (!CreatedRotate)
		return nullptr;

		auto OredEnd = remove_if(OredOps, [](SDValue V) { return !bool(V); });
		unsigned Size = OredEnd - OredOps.begin();
		while (Size != 1) {
		for (unsigned i = 0; i != Size/2; ++i)
		OredOps[i] = DAG.getNode(ISD::OR, dl, VT, OredOps[2i], OredOps[2i+1]);
		if (Size % 2 != 0) {
		OredOps[Size/2] = OredOps[Size-1];
		Size = Size/2 + 1;
		} else
		Size /= 2;
		}

		// The last remaining op is the root.
		return OredOps[0].getNode();
		}

/// Match "(X shl/srl V1) & V2" where V2 may not be present.		/// Match "(X shl/srl V1) & V2" where V2 may not be present.
bool DAGCombiner::MatchRotateHalf(SDValue Op, SDValue &Shift, SDValue &Mask) {		bool DAGCombiner::MatchRotateHalf(SDValue Op, SDValue &Shift, SDValue &Mask) {
if (Op.getOpcode() == ISD::AND) {		if (Op.getOpcode() == ISD::AND) {
if (DAG.isConstantIntBuildVectorOrConstantInt(Op.getOperand(1))) {		if (DAG.isConstantIntBuildVectorOrConstantInt(Op.getOperand(1))) {
Mask = Op.getOperand(1);		Mask = Op.getOperand(1);
Op = Op.getOperand(0);		Op = Op.getOperand(0);
} else {		} else {
return false;		return false;
}		}
}		}

if (Op.getOpcode() == ISD::SRL \|\| Op.getOpcode() == ISD::SHL) {		if (Op.getOpcode() == ISD::SRL \|\| Op.getOpcode() == ISD::SHL) {
Shift = Op;		Shift = Op;
return true;		return true;
		lebedev.riUnsubmitted Not Done Reply Inline Actions Do you want to assert that you start with `ISD::OR`? lebedev.ri: Do you want to assert that you start with `ISD::OR`?
}		}

return false;		return false;
}		}
		lebedev.riUnsubmitted Not Done Reply Inline Actions This comment should be next to the loop itself. It would be best to explain the data structures here instead. lebedev.ri: This comment should be next to the loop itself. It would be best to explain the data structures…

// Return true if we can prove that, whenever Neg and Pos are both in the		// Return true if we can prove that, whenever Neg and Pos are both in the
// range [0, EltSize), Neg == (Pos == 0 ? 0 : EltSize - Pos). This means that		// range [0, EltSize), Neg == (Pos == 0 ? 0 : EltSize - Pos). This means that
// for two opposing shifts shift1 and shift2 and a value X with OpBits bits:		// for two opposing shifts shift1 and shift2 and a value X with OpBits bits:
//		//
// (or (shift1 X, Neg), (shift2 X, Pos))		// (or (shift1 X, Neg), (shift2 X, Pos))
//		//
// reduces to a rotate in direction shift2 by Pos or (equivalently) a rotate		// reduces to a rotate in direction shift2 by Pos or (equivalently) a rotate
// in direction shift1 by Neg. The range [0, EltSize) means that we only need		// in direction shift1 by Neg. The range [0, EltSize) means that we only need
// to consider shift amounts with defined behavior.		// to consider shift amounts with defined behavior.
static bool matchRotateSub(SDValue Pos, SDValue Neg, unsigned EltSize,		static bool matchRotateSub(SDValue Pos, SDValue Neg, unsigned EltSize,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
// If EltSize is a power of 2 then:		// If EltSize is a power of 2 then:
//		//
// (a) (Pos == 0 ? 0 : EltSize - Pos) == (EltSize - Pos) & (EltSize - 1)		// (a) (Pos == 0 ? 0 : EltSize - Pos) == (EltSize - Pos) & (EltSize - 1)
// (b) Neg == Neg & (EltSize - 1) whenever Neg is in [0, EltSize).		// (b) Neg == Neg & (EltSize - 1) whenever Neg is in [0, EltSize).
//		//
		lebedev.riUnsubmitted Not Done Reply Inline Actions // for each shifted value, create a list of shifts. lebedev.ri: ``` // for each shifted value, create a list of shifts. ```
// So if EltSize is a power of 2 and Neg is (and Neg', EltSize-1), we check		// So if EltSize is a power of 2 and Neg is (and Neg', EltSize-1), we check
		lebedev.riUnsubmitted Not Done Reply Inline Actions This data structure really needs a better name/description. lebedev.ri: This data structure really needs a better name/description.
// for the stronger condition:		// for the stronger condition:
//		//
// Neg & (EltSize - 1) == (EltSize - Pos) & (EltSize - 1) [A]		// Neg & (EltSize - 1) == (EltSize - Pos) & (EltSize - 1) [A]
//		//
// for all Neg and Pos. Since Neg & (EltSize - 1) == Neg' & (EltSize - 1)		// for all Neg and Pos. Since Neg & (EltSize - 1) == Neg' & (EltSize - 1)
		lebedev.riUnsubmitted Not Done Reply Inline Actions Can you not do this if (Opc == ISD::SHL \|\| Opc == ISD::SRL) OpMap[V.getOperand(0)->getNodeId()].push_back(I); inside } else OredOps.push_back(V); ? One less loop. lebedev.ri: Can you not do this ``` if (Opc == ISD::SHL \|\| Opc == ISD::SRL) OpMap[V.getOperand(0)…
// we can just replace Neg with Neg' for the rest of the function.		// we can just replace Neg with Neg' for the rest of the function.
//		//
// In other cases we check for the even stronger condition:		// In other cases we check for the even stronger condition:
//		//
// Neg == EltSize - Pos [B]		// Neg == EltSize - Pos [B]
//		//
// for all Neg and Pos. Note that the (or ...) then invokes undefined		// for all Neg and Pos. Note that the (or ...) then invokes undefined
// behavior if Pos == 0 (and consequently Neg == EltSize).		// behavior if Pos == 0 (and consequently Neg == EltSize).
//		//
// We could actually use [A] whenever EltSize is a power of 2, but the		// We could actually use [A] whenever EltSize is a power of 2, but the
// only extra cases that it would match are those uninteresting ones		// only extra cases that it would match are those uninteresting ones
// where Neg and Pos are never in range at the same time. E.g. for		// where Neg and Pos are never in range at the same time. E.g. for
		lebedev.riUnsubmitted Not Done Reply Inline Actions I'm not sure whether or not `auto` is good here.. lebedev.ri: I'm not sure whether or not `auto` is good here..
// EltSize == 32, using [A] would allow a Neg of the form (sub 64, Pos)		// EltSize == 32, using [A] would allow a Neg of the form (sub 64, Pos)
// as well as (sub 32, Pos), but:		// as well as (sub 32, Pos), but:
		lebedev.riUnsubmitted Not Done Reply Inline Actions Do we care about the order within those two groups? Is this still good in reverse-iteration mode? lebedev.ri: Do we care about the order within those two groups? Is this still good in reverse-iteration…
//		//
// (or (shift1 X, (sub 64, Pos)), (shift2 X, Pos))		// (or (shift1 X, (sub 64, Pos)), (shift2 X, Pos))
//		//
// always invokes undefined behavior for 32-bit X.		// always invokes undefined behavior for 32-bit X.
		lebedev.riUnsubmitted Not Done Reply Inline Actions `llvm::partition_point()`? lebedev.ri: `llvm::partition_point()`?
//		//
// Below, Mask == EltSize - 1 when using [A] and is all-ones otherwise.		// Below, Mask == EltSize - 1 when using [A] and is all-ones otherwise.
unsigned MaskLoBits = 0;		unsigned MaskLoBits = 0;
if (Neg.getOpcode() == ISD::AND && isPowerOf2_64(EltSize)) {		if (Neg.getOpcode() == ISD::AND && isPowerOf2_64(EltSize)) {
		lebedev.riUnsubmitted Not Done Reply Inline Actions I'm not sure what is going on here, would be good to have a high-level description comment. lebedev.ri: I'm not sure what is going on here, would be good to have a high-level description comment.
if (ConstantSDNode *NegC = isConstOrConstSplat(Neg.getOperand(1))) {		if (ConstantSDNode *NegC = isConstOrConstSplat(Neg.getOperand(1))) {
KnownBits Known;		KnownBits Known;
DAG.computeKnownBits(Neg.getOperand(0), Known);		DAG.computeKnownBits(Neg.getOperand(0), Known);
		lebedev.riUnsubmitted Not Done Reply Inline Actions Early `continue`? lebedev.ri: Early `continue`?
unsigned Bits = Log2_64(EltSize);		unsigned Bits = Log2_64(EltSize);
if (NegC->getAPIntValue().getActiveBits() <= Bits &&		if (NegC->getAPIntValue().getActiveBits() <= Bits &&
((NegC->getAPIntValue() \| Known.Zero).countTrailingOnes() >= Bits)) {		((NegC->getAPIntValue() \| Known.Zero).countTrailingOnes() >= Bits)) {
Neg = Neg.getOperand(0);		Neg = Neg.getOperand(0);
MaskLoBits = Bits;		MaskLoBits = Bits;
}		}
}		}
}		}

// Check whether Neg has the form (sub NegC, NegOp1) for some NegC and NegOp1.		// Check whether Neg has the form (sub NegC, NegOp1) for some NegC and NegOp1.
if (Neg.getOpcode() != ISD::SUB)		if (Neg.getOpcode() != ISD::SUB)
return false;		return false;
ConstantSDNode *NegC = isConstOrConstSplat(Neg.getOperand(0));		ConstantSDNode *NegC = isConstOrConstSplat(Neg.getOperand(0));
if (!NegC)		if (!NegC)
return false;		return false;
SDValue NegOp1 = Neg.getOperand(1);		SDValue NegOp1 = Neg.getOperand(1);

		lebedev.riUnsubmitted Not Done Reply Inline Actions Likewise, this really needs to have a high-level description comment. lebedev.ri: Likewise, this really needs to have a high-level description comment.
// On the RHS of [A], if Pos is Pos' & (EltSize - 1), just replace Pos with		// On the RHS of [A], if Pos is Pos' & (EltSize - 1), just replace Pos with
// Pos'. The truncation is redundant for the purpose of the equality.		// Pos'. The truncation is redundant for the purpose of the equality.
if (MaskLoBits && Pos.getOpcode() == ISD::AND) {		if (MaskLoBits && Pos.getOpcode() == ISD::AND) {
if (ConstantSDNode *PosC = isConstOrConstSplat(Pos.getOperand(1))) {		if (ConstantSDNode *PosC = isConstOrConstSplat(Pos.getOperand(1))) {
KnownBits Known;		KnownBits Known;
DAG.computeKnownBits(Pos.getOperand(0), Known);		DAG.computeKnownBits(Pos.getOperand(0), Known);
if (PosC->getAPIntValue().getActiveBits() <= MaskLoBits &&		if (PosC->getAPIntValue().getActiveBits() <= MaskLoBits &&
((PosC->getAPIntValue() \| Known.Zero).countTrailingOnes() >=		((PosC->getAPIntValue() \| Known.Zero).countTrailingOnes() >=
▲ Show 20 Lines • Show All 13,219 Lines • Show Last 20 Lines

test/CodeGen/Hexagon/rotate-multi.ll

This file was added.

				; RUN: llc -march=hexagon < %s \| FileCheck %s

				; OR of two rotates of %a0(r0).
				; CHECK-LABEL: f0:
				; CHECK: r[[R00:[0-9]+]] = rol(r0,#7)
				; CHECK: r[[R00]] \|= rol(r0,#9)
				define i32 @f0(i32 %a0) #0 {
				b0:
				%v0 = shl i32 %a0, 7
				%v1 = lshr i32 %a0, 25
				%v2 = or i32 %v0, %v1
				%v3 = shl i32 %a0, 9
				%v4 = lshr i32 %a0, 23
				%v5 = or i32 %v3, %v4
				%v6 = or i32 %v2, %v5
				ret i32 %v6
				}

				; OR of two rotates of %a0(r0) with an extra input %a1(r1).
				; CHECK-LABEL: f1:
				; CHECK: r1 \|= asl(r0,#7)
				; CHECK: r1 \|= rol(r0,#9)
				define i32 @f1(i32 %a0, i32 %a1) #0 {
				b0:
				%v0 = shl i32 %a0, 7
				%v1 = lshr i32 %a0, 25
				%v2 = or i32 %v0, %a1
				%v3 = shl i32 %a0, 9
				%v4 = lshr i32 %a0, 23
				%v5 = or i32 %v3, %v4
				%v6 = or i32 %v2, %v5
				%v7 = or i32 %v6, %v1
				ret i32 %v6
				}

				; OR of two rotates of two different inputs: %a0(r0) and %a1(r1).
				; CHECK-LABEL: f2:
				; CHECK: r0 = rol(r0,#11)
				; CHECK: r0 \|= rol(r1,#19)
				define i32 @f2(i32 %a0, i32 %a1) #0 {
				%v0 = shl i32 %a0, 11
				%v1 = lshr i32 %a0, 21
				%v2 = shl i32 %a1, 19
				%v3 = lshr i32 %a1, 13
				%v4 = or i32 %v0, %v2
				%v5 = or i32 %v1, %v3
				%v6 = or i32 %v4, %v5
				ret i32 %v6
				}

				; ORs of multiple shifts of the same value with only one pair actually
				; matching a rotate.
				; CHECK-LABEL: f3:
				; CHECK-NOT: rol
				; CHECK: r[[R30:[0-9]+]] \|= rol(r0,#7)
				; CHECK-NOT: rol
				define i32 @f3(i32 %a0) #0 {
				%v0 = shl i32 %a0, 3
				%v1 = shl i32 %a0, 5
				%v2 = shl i32 %a0, 7 ; rotate
				%v3 = shl i32 %a0, 13
				%v4 = shl i32 %a0, 19
				%v5 = lshr i32 %a0, 2
				%v6 = lshr i32 %a0, 15
				%v7 = lshr i32 %a0, 23
				%v8 = lshr i32 %a0, 25 ; rotate
				%v9 = lshr i32 %a0, 30
				%v10 = or i32 %v0, %v1
				%v11 = or i32 %v10, %v2
				%v12 = or i32 %v11, %v3
				%v13 = or i32 %v12, %v4
				%v14 = or i32 %v13, %v5
				%v15 = or i32 %v14, %v6
				%v16 = or i32 %v15, %v7
				%v17 = or i32 %v16, %v8
				%v18 = or i32 %v17, %v9
				ret i32 %v18
				}

				attributes #0 = { readnone nounwind "target-cpu"="hexagonv60" "target-features"="-packets" }

test/CodeGen/Hexagon/rotate.ll

	Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines
	b0:			b0:
	%v0 = shl i32 %a1, 7			%v0 = shl i32 %a1, 7
	%v1 = lshr i32 %a1, 25			%v1 = lshr i32 %a1, 25
	%v2 = or i32 %v0, %v1			%v2 = or i32 %v0, %v1
	%v3 = and i32 %v2, %a0			%v3 = and i32 %v2, %a0
	ret i32 %v3			ret i32 %v3
	}			}

				; CHECK-LABEL: f11
				; CHECK: r0 \|= rol(r1,#7)
				define i32 @f11(i32 %a0, i32 %a1) #0 {
				b0:
				%v0 = shl i32 %a1, 7
				%v1 = lshr i32 %a1, 25
				%v2 = or i32 %v1, %a0
				%v3 = or i32 %v2, %v0
				ret i32 %v3
				}

	; CHECK-LABEL: f12			; CHECK-LABEL: f12
	; CHECK: r0 ^= rol(r1,#7)			; CHECK: r0 ^= rol(r1,#7)
	define i32 @f12(i32 %a0, i32 %a1) #0 {			define i32 @f12(i32 %a0, i32 %a1) #0 {
	b0:			b0:
	%v0 = shl i32 %a1, 7			%v0 = shl i32 %a1, 7
	%v1 = lshr i32 %a1, 25			%v1 = lshr i32 %a1, 25
	%v2 = or i32 %v0, %v1			%v2 = or i32 %v0, %v1
	%v3 = xor i32 %v2, %a0			%v3 = xor i32 %v2, %a0
	Show All 28 Lines
	b0:			b0:
	%v0 = shl i64 %a1, 7			%v0 = shl i64 %a1, 7
	%v1 = lshr i64 %a1, 57			%v1 = lshr i64 %a1, 57
	%v2 = or i64 %v0, %v1			%v2 = or i64 %v0, %v1
	%v3 = and i64 %v2, %a0			%v3 = and i64 %v2, %a0
	ret i64 %v3			ret i64 %v3
	}			}

				; CHECK-LABEL: f16
				; CHECK: r1:0 \|= rol(r3:2,#7)
				define i64 @f16(i64 %a0, i64 %a1) #0 {
				b0:
				%v0 = shl i64 %a1, 7
				%v1 = lshr i64 %a1, 57
				%v2 = or i64 %v1, %a0
				%v3 = or i64 %v2, %v0
				ret i64 %v3
				}

	; CHECK-LABEL: f17			; CHECK-LABEL: f17
	; CHECK: r1:0 ^= rol(r3:2,#7)			; CHECK: r1:0 ^= rol(r3:2,#7)
	define i64 @f17(i64 %a0, i64 %a1) #0 {			define i64 @f17(i64 %a0, i64 %a1) #0 {
	b0:			b0:
	%v0 = shl i64 %a1, 7			%v0 = shl i64 %a1, 7
	%v1 = lshr i64 %a1, 57			%v1 = lshr i64 %a1, 57
	%v2 = or i64 %v0, %v1			%v2 = or i64 %v0, %v1
	%v3 = xor i64 %v2, %a0			%v3 = xor i64 %v2, %a0
	ret i64 %v3			ret i64 %v3
	}			}

	attributes #0 = { norecurse nounwind readnone "target-cpu"="hexagonv60" "target-features"="-packets" }			attributes #0 = { norecurse nounwind readnone "target-cpu"="hexagonv60" "target-features"="-packets" }

test/CodeGen/X86/rotate-multi.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=x86_64-- < %s \| FileCheck %s
				lebedev.riUnsubmitted Not Done Reply Inline Actions X86(and most others) tests mostly use `utils/update_llc_test_checks.py` lebedev.ri: X86(and most others) tests mostly use `utils/update_llc_test_checks.py`
				kparzyszAuthorUnsubmitted Not Done Reply Inline Actions The output looks worse: it checks almost every instruction and has specific register names (outside of argument registers). kparzysz: The output looks worse: it checks almost every instruction and has specific register names…
				RKSimonUnsubmitted Not Done Reply Inline Actions But its much less likely to allow mistakes to get through. Plus on x86 its often very useful to check all the surrounding code as that often hides issues. Once the test file is finalised best practice is to use update_llc_test_checks on it against trunk, commit that and then update the patch to show the codegen diff. Also, your RUN line should use a triple, not an arch. RKSimon: But its much less likely to allow mistakes to get through. Plus on x86 its often very useful to…

				; OR of two rotates of %a0(edi).
				define i32 @f0(i32 %a0) #0 {
				; CHECK-LABEL: f0:
				; CHECK: # %bb.0: # %b0
				; CHECK-NEXT: movl %edi, %eax
				; CHECK-NEXT: roll $9, %eax
				; CHECK-NEXT: roll $7, %edi
				; CHECK-NEXT: orl %eax, %edi
				; CHECK-NEXT: movl %edi, %eax
				; CHECK-NEXT: retq
				b0:
				%v0 = shl i32 %a0, 7
				%v1 = lshr i32 %a0, 25
				%v2 = or i32 %v0, %v1
				%v3 = shl i32 %a0, 9
				%v4 = lshr i32 %a0, 23
				%v5 = or i32 %v3, %v4
				%v6 = or i32 %v2, %v5
				ret i32 %v6
				}

				; OR of two rotates of %a0(edi) with an extra input %a1(esi).
				define i32 @f1(i32 %a0, i32 %a1) #0 {
				; CHECK-LABEL: f1:
				; CHECK: # %bb.0: # %b0
				; CHECK-NEXT: movl %edi, %eax
				; CHECK-NEXT: shll $7, %eax
				; CHECK-NEXT: roll $9, %edi
				; CHECK-NEXT: orl %esi, %edi
				; CHECK-NEXT: orl %eax, %edi
				; CHECK-NEXT: movl %edi, %eax
				; CHECK-NEXT: retq
				b0:
				%v0 = shl i32 %a0, 7
				%v1 = lshr i32 %a0, 25
				%v2 = or i32 %v0, %a1
				%v3 = shl i32 %a0, 9
				%v4 = lshr i32 %a0, 23
				%v5 = or i32 %v3, %v4
				%v6 = or i32 %v2, %v5
				%v7 = or i32 %v6, %v1
				ret i32 %v6
				}

				; OR of two rotates of two different inputs: %a0(edi) and %a1(esi).
				define i32 @f2(i32 %a0, i32 %a1) #0 {
				; CHECK-LABEL: f2:
				; CHECK: # %bb.0:
				; CHECK-NEXT: roll $19, %esi
				; CHECK-NEXT: roll $11, %edi
				; CHECK-NEXT: orl %esi, %edi
				; CHECK-NEXT: movl %edi, %eax
				; CHECK-NEXT: retq
				%v0 = shl i32 %a0, 11
				%v1 = lshr i32 %a0, 21
				%v2 = shl i32 %a1, 19
				%v3 = lshr i32 %a1, 13
				%v4 = or i32 %v0, %v2
				%v5 = or i32 %v1, %v3
				%v6 = or i32 %v4, %v5
				ret i32 %v6
				}

				; ORs of multiple shifts of the same value with only one pair actually
				; matching a rotate.
				define i32 @f3(i32 %a0) #0 {
				; CHECK-LABEL: f3:
				; CHECK: # %bb.0:
				; CHECK-NEXT: # kill: def $edi killed $edi def $rdi
				; CHECK-NEXT: leal (,%rdi,8), %eax
				; CHECK-NEXT: movl %edi, %ecx
				; CHECK-NEXT: shll $5, %ecx
				; CHECK-NEXT: orl %eax, %ecx
				; CHECK-NEXT: movl %edi, %eax
				; CHECK-NEXT: shll $13, %eax
				; CHECK-NEXT: orl %ecx, %eax
				; CHECK-NEXT: movl %edi, %ecx
				; CHECK-NEXT: shll $19, %ecx
				; CHECK-NEXT: orl %eax, %ecx
				; CHECK-NEXT: movl %edi, %eax
				; CHECK-NEXT: shrl $2, %eax
				; CHECK-NEXT: movl %edi, %edx
				; CHECK-NEXT: shrl $15, %edx
				; CHECK-NEXT: orl %eax, %edx
				; CHECK-NEXT: movl %edi, %eax
				; CHECK-NEXT: shrl $23, %eax
				; CHECK-NEXT: orl %edx, %eax
				; CHECK-NEXT: orl %ecx, %eax
				; CHECK-NEXT: movl %edi, %ecx
				; CHECK-NEXT: shrl $30, %ecx
				; CHECK-NEXT: roll $7, %edi
				; CHECK-NEXT: orl %eax, %edi
				; CHECK-NEXT: orl %ecx, %edi
				; CHECK-NEXT: movl %edi, %eax
				; CHECK-NEXT: retq
				%v0 = shl i32 %a0, 3
				%v1 = shl i32 %a0, 5
				%v2 = shl i32 %a0, 7 ; rotate
				%v3 = shl i32 %a0, 13
				%v4 = shl i32 %a0, 19
				%v5 = lshr i32 %a0, 2
				%v6 = lshr i32 %a0, 15
				%v7 = lshr i32 %a0, 23
				%v8 = lshr i32 %a0, 25 ; rotate
				%v9 = lshr i32 %a0, 30
				%v10 = or i32 %v0, %v1
				%v11 = or i32 %v10, %v2
				%v12 = or i32 %v11, %v3
				%v13 = or i32 %v12, %v4
				%v14 = or i32 %v13, %v5
				%v15 = or i32 %v14, %v6
				%v16 = or i32 %v15, %v7
				%v17 = or i32 %v16, %v8
				%v18 = or i32 %v17, %v9
				ret i32 %v18
				}

				attributes #0 = { readnone nounwind }

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Create rotates more aggressivelyNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 151777

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/Hexagon/rotate-multi.ll

test/CodeGen/Hexagon/rotate.ll

test/CodeGen/X86/rotate-multi.ll

[DAGCombiner] Create rotates more aggressively
Needs ReviewPublic