This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
10/11
DAGCombiner.cpp
-
TargetLowering.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64ISelLowering.h
1/1
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
srem-pow2.ll
-
srem-seteq.ll
1
srem-vector-lkk.ll

Differential D122968

[AArch64][SelectionDAG] Add target-specific implementation of srem
ClosedPublic

Authored by bcl5980 on Apr 2 2022, 12:06 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
david-arm
fhahn
craig.topper
efriedma
benshi001
bogner
RKSimon

Commits

rG222adf338a41: [Arch64][SelectionDAG] Add target-specific implementation of srem
rG9d9eddd3dde4: [Arch64][SelectionDAG] Add target-specific implementation of srem

Summary

X%C to the equivalent of X-X/C*C is not always fastest path if there is no SDIV pair exist. So check target have faster for srem only first.
Add AArch64 faster path for SREM only pow2 case

Fix https://github.com/llvm/llvm-project/issues/54649

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

bcl5980 created this revision.Apr 2 2022, 12:06 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 2 2022, 12:06 AM

Herald added subscribers: StephenFan, ecnelises, hiraditya, kristof.beyls. · View Herald Transcript

bcl5980 requested review of this revision.Apr 2 2022, 12:06 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 2 2022, 12:06 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B157573: Diff 419951.Apr 2 2022, 12:56 AM

add test case srem 2

Harbormaster completed remote builds in B158121: Diff 420710.Apr 5 2022, 11:50 PM

Ping.

bcl5980 added a reviewer: bogner.Apr 12 2022, 2:06 AM

craig.topper added a reviewer: RKSimon.Apr 12 2022, 10:08 AM

efriedma added inline comments.Apr 12 2022, 11:41 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
13591	Probably worth a comment explaining this. Lg2 == 0 folds to a constant. Not obvious why we're special-casing `Lg2 == (VT.getScalarSizeInBits() - 1)`; does this sequence not work somehow?
llvm/test/CodeGen/AArch64/srem-lkk.ll
171 ↗	(On Diff #420710)	Maybe we also want some tests for i16? Please post a separate patch to add the new tests, so it's easier to compare the impact. And actually, while you're at it, maybe stick all the power-of-two tests in a separate file with a better name.
llvm/test/CodeGen/AArch64/srem-vector-lkk.ll
188	I think this somehow ended up shorter with D122829? That might be fine, though; it looks like this version avoids the expensive add-with-shift instructions.

efriedma added inline comments.Apr 12 2022, 11:44 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4325	I'm not sure I like the modified API for visitSDIVLike? It's sort of weird that the return value is either the quotient or the remainder, depending on the value of BuildRem. And that BuildRem is both an input and an output.

bcl5980 mentioned this in D123671: [AArch64][SelectionDAG] stick all the power-of-two tests in a separate file; NFC.Apr 13 2022, 4:42 AM

bcl5980 added inline comments.

llvm/test/CodeGen/AArch64/srem-lkk.ll
171 ↗	(On Diff #420710)	separate patch: D123671 @efriedma Can you help to review?

bcl5980 added inline comments.Apr 13 2022, 4:50 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4325	Yeah, this part is ugly. I will refactor this part today

refactor visitSDIVLike
remove Lg2 == (VT.getScalarSizeInBits() - 1)
add srem-pow2.ll, will rebase after D123671 land

bcl5980 mentioned this in rGe2d77a160c5b: [SimplifyCFG] add tests for switch to select; NFC.Apr 13 2022, 6:27 AM

Harbormaster completed remote builds in B159433: Diff 422491.Apr 13 2022, 7:02 AM

Please ignore the check in rGe2d77a160c5b. I made a mistake when check in another baseline.

bcl5980 mentioned this in rG82e5976b7dc2: [AArch64][SelectionDAG] stick all the power-of-two tests in a separate file; NFC.Apr 13 2022, 9:49 AM

rebase main

Harbormaster completed remote builds in B159469: Diff 422549.Apr 13 2022, 10:30 AM

efriedma added inline comments.Apr 13 2022, 11:02 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4582	Do we need to do something special for something like the following, to match the existing handling for div/rem pairs? I guess it's only a couple instructions different either way, but maybe worth considering. define void @sdivrem(i32 %x, i32* %ap, i32* %bp) { %a = sdiv i32 %x, 4 %b = srem i32 %x, 4 store i32 %a, i32* %ap store i32 %b, i32* %bp ret void }
23902	Fix this comment? Not sure how you compute srem "by right shifting"

bcl5980 added inline comments.Apr 13 2022, 9:29 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4582	For now I have no AArch64 device to test, do we have some tools to verify instructions similar to alive2.llvm.org/ce/ on AArch64? If no maybe I should use qemu to build a test environment.

add a IR level proof for this change
https://alive2.llvm.org/ce/z/djRQRY

bcl5980 added inline comments.Apr 14 2022, 1:44 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4582	I use IR to write the logic. Is this the behavior you want? https://alive2.llvm.org/ce/z/hGZBBk

bcl5980 added inline comments.Apr 14 2022, 2:40 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4582	Based on the IR to asm by llc, it looks sdiv/srem pair can't get benifit from this pattern. https://godbolt.org/z/Pns5YvYfY And it looks we lack dag combine from neg + cmp to negs based on the test.

efriedma added inline comments.Apr 14 2022, 11:38 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4582	I just meant that I wasn't sure if we would end up with the same code sequence as we would without this patch; I wasn't suggesting any specific transform. (I guess what happens might be sensitive to whether we visit the SDIV or the SREM first...)

bcl5980 added inline comments.Apr 14 2022, 9:12 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4582	I check the !DAG.doesNodeExist(ISD::SDIV, N->getVTList(), {N0, N1}) at line 4412, SDIV / SREM paire won't go into the srem only path

LGTM with one minor comment

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
422	Please rename this so it's clear it's not the primary visitor function for SREM. Maybe call it "buildOptimizedSREM" or something like that.
4582	Oh, I see; that's fine, then.

This revision is now accepted and ready to land.Apr 15 2022, 12:53 PM

rebase and update comments

This revision was landed with ongoing or failed builds.Apr 15 2022, 9:30 PM

Closed by commit rG9d9eddd3dde4: [Arch64][SelectionDAG] Add target-specific implementation of srem (authored by bcl5980). · Explain Why

This revision was automatically updated to reflect the committed changes.

bcl5980 added a commit: rG9d9eddd3dde4: [Arch64][SelectionDAG] Add target-specific implementation of srem.

Harbormaster completed remote builds in B159914: Diff 423219.Apr 15 2022, 10:04 PM

FYI, I’ve bisected a misoptimization down to this commit. I’ll follow up later with a proper repro…

In D122968#3455783, @mstorsjo wrote:

FYI, I’ve bisected a misoptimization down to this commit. I’ll follow up later with a proper repro…

This commit causes misoptimizations of the following function:

int func(int val) {
    val += val % 2;
    return val;
}

Tested with a caller like this:

#include <stdio.h>
int func(int val);
int main(int argc, char* argv[]) {
    for (int i = -3; i <= 3; i++)
        printf("%d -> %d\n", i, (int)func(i));
    return 0;
}

Originally this function returns the following values:

-3 -> -4
-2 -> -2
-1 -> -2 
0 -> 0
1 -> 2
2 -> 2
3 -> 4

After this commit, it prints the following:

-3 -> -2
-2 -> -2
-1 -> 0 
0 -> 0
1 -> 0
2 -> 2
3 -> 2

Originally, the function was compiled into the following (with -O2):

func:
        cmp     w0, #0
        cinc    w8, w0, lt
        and     w8, w8, #0xfffffffe
        sub     w8, w0, w8
        add     w0, w8, w0
        ret

After this change, it gets incorrectly compiled into this:

func:
        and     w8, w0, #0x1
        cmp     w0, #0
        cneg    w8, w8, ge
        add     w0, w8, w0
        ret

bcl5980 added a reverting change: rGacfc025a7232: Revert "[Arch64][SelectionDAG] Add target-specific implementation of srem".Apr 17 2022, 7:35 PM

In D122968#3456041, @mstorsjo wrote:
In D122968#3455783, @mstorsjo wrote:

After this change, it gets incorrectly compiled into this:
func:
        and     w8, w0, #0x1
        cmp     w0, #0
        cneg    w8, w8, ge

Should be 'csneg w8, w8, w8, ge' here

add     w0, w8, w0
ret

Thanks for the finding. I have no AArch64 device for now , so can you help to modify line 13600 to

SDValue Cmp = getAArch64Cmp(N0, Zero, ISD::SETGE, CCVal, DAG, DL);

Fix logical error

Harbormaster completed remote builds in B160002: Diff 423329.Apr 17 2022, 9:09 PM

If you want to try out some stuff yourself, on Linux, it isn't too hard to run stuff using qemu-user. Get a cross-compile toolchain (e.g. https://developer.arm.com/-/media/Files/downloads/gnu/11.2-2022.02/binrel/gcc-arm-11.2-2022.02-x86_64-aarch64-none-linux-gnu.tar.xz ). The do something like:

clang --gcc-toolchain=/path/to/toolchain --target=aarch64-none-linux-gnu test.c -o test -static
qemu-aarch64 test

I'll run this through some tests today.

This revision is now accepted and ready to land.Apr 18 2022, 8:44 AM

In D122968#3456859, @efriedma wrote:

If you want to try out some stuff yourself, on Linux, it isn't too hard to run stuff using qemu-user. Get a cross-compile toolchain (e.g. https://developer.arm.com/-/media/Files/downloads/gnu/11.2-2022.02/binrel/gcc-arm-11.2-2022.02-x86_64-aarch64-none-linux-gnu.tar.xz ). The do something like:

clang --gcc-toolchain=/path/to/toolchain --target=aarch64-none-linux-gnu test.c -o test -static
qemu-aarch64 test

I'll run this through some tests today.

Thanks for the way running on qemu user mode.
I will build the environment on WSL and test on my local machine.

This revision was landed with ongoing or failed builds.Apr 18 2022, 11:56 AM

Closed by commit rG222adf338a41: [Arch64][SelectionDAG] Add target-specific implementation of srem (authored by bcl5980). · Explain Why

This revision was automatically updated to reflect the committed changes.

bcl5980 added a commit: rG222adf338a41: [Arch64][SelectionDAG] Add target-specific implementation of srem.

dewen added a subscriber: dewen.Jan 18 2023, 7:05 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

8 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

57 lines

TargetLowering.cpp

11 lines

Target/

AArch64/

AArch64ISelLowering.h

2 lines

AArch64ISelLowering.cpp

54 lines

test/

CodeGen/

AArch64/

srem-pow2.ll

48 lines

srem-seteq.ll

21 lines

srem-vector-lkk.ll

69 lines

Diff 423440

llvm/include/llvm/CodeGen/TargetLowering.h

	Show First 20 Lines • Show All 4,473 Lines • ▼ Show 20 Lines
	/// Targets may override this function to provide custom SDIV lowering for			/// Targets may override this function to provide custom SDIV lowering for
	/// power-of-2 denominators. If the target returns an empty SDValue, LLVM			/// power-of-2 denominators. If the target returns an empty SDValue, LLVM
	/// assumes SDIV is expensive and replaces it with a series of other integer			/// assumes SDIV is expensive and replaces it with a series of other integer
	/// operations.			/// operations.
	virtual SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor,			virtual SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor,
	SelectionDAG &DAG,			SelectionDAG &DAG,
	SmallVectorImpl<SDNode *> &Created) const;			SmallVectorImpl<SDNode *> &Created) const;

				/// Targets may override this function to provide custom SREM lowering for
				/// power-of-2 denominators. If the target returns an empty SDValue, LLVM
				/// assumes SREM is expensive and replaces it with a series of other integer
				/// operations.
				virtual SDValue BuildSREMPow2(SDNode *N, const APInt &Divisor,
				SelectionDAG &DAG,
				SmallVectorImpl<SDNode *> &Created) const;

	/// Indicate whether this target prefers to combine FDIVs with the same			/// Indicate whether this target prefers to combine FDIVs with the same
	/// divisor. If the transform should never be done, return zero. If the			/// divisor. If the transform should never be done, return zero. If the
	/// transform should be done, return the minimum number of divisor uses			/// transform should be done, return the minimum number of divisor uses
	/// that must exist.			/// that must exist.
	virtual unsigned combineRepeatedFPDivisors() const {			virtual unsigned combineRepeatedFPDivisors() const {
	return 0;			return 0;
	}			}

	▲ Show 20 Lines • Show All 406 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 413 Lines • ▼ Show 20 Lines	private:
SDValue visitSUBE(SDNode *N);		SDValue visitSUBE(SDNode *N);
SDValue visitSUBCARRY(SDNode *N);		SDValue visitSUBCARRY(SDNode *N);
SDValue visitSSUBO_CARRY(SDNode *N);		SDValue visitSSUBO_CARRY(SDNode *N);
SDValue visitMUL(SDNode *N);		SDValue visitMUL(SDNode *N);
SDValue visitMULFIX(SDNode *N);		SDValue visitMULFIX(SDNode *N);
SDValue useDivRem(SDNode *N);		SDValue useDivRem(SDNode *N);
SDValue visitSDIV(SDNode *N);		SDValue visitSDIV(SDNode *N);
SDValue visitSDIVLike(SDValue N0, SDValue N1, SDNode *N);		SDValue visitSDIVLike(SDValue N0, SDValue N1, SDNode *N);
SDValue visitUDIV(SDNode *N);		SDValue visitUDIV(SDNode *N);
		efriedmaUnsubmitted Done Reply Inline Actions Please rename this so it's clear it's not the primary visitor function for SREM. Maybe call it "buildOptimizedSREM" or something like that. efriedma: Please rename this so it's clear it's not the primary visitor function for SREM. Maybe call it…
SDValue visitUDIVLike(SDValue N0, SDValue N1, SDNode *N);		SDValue visitUDIVLike(SDValue N0, SDValue N1, SDNode *N);
SDValue visitREM(SDNode *N);		SDValue visitREM(SDNode *N);
SDValue visitMULHU(SDNode *N);		SDValue visitMULHU(SDNode *N);
SDValue visitMULHS(SDNode *N);		SDValue visitMULHS(SDNode *N);
SDValue visitAVG(SDNode *N);		SDValue visitAVG(SDNode *N);
SDValue visitSMUL_LOHI(SDNode *N);		SDValue visitSMUL_LOHI(SDNode *N);
SDValue visitUMUL_LOHI(SDNode *N);		SDValue visitUMUL_LOHI(SDNode *N);
SDValue visitMULO(SDNode *N);		SDValue visitMULO(SDNode *N);
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	private:
SDValue CombineExtLoad(SDNode *N);		SDValue CombineExtLoad(SDNode *N);
SDValue CombineZExtLogicopShiftLoad(SDNode *N);		SDValue CombineZExtLogicopShiftLoad(SDNode *N);
SDValue combineRepeatedFPDivisors(SDNode *N);		SDValue combineRepeatedFPDivisors(SDNode *N);
SDValue combineInsertEltToShuffle(SDNode *N, unsigned InsIndex);		SDValue combineInsertEltToShuffle(SDNode *N, unsigned InsIndex);
SDValue ConstantFoldBITCASTofBUILD_VECTOR(SDNode *, EVT);		SDValue ConstantFoldBITCASTofBUILD_VECTOR(SDNode *, EVT);
SDValue BuildSDIV(SDNode *N);		SDValue BuildSDIV(SDNode *N);
SDValue BuildSDIVPow2(SDNode *N);		SDValue BuildSDIVPow2(SDNode *N);
SDValue BuildUDIV(SDNode *N);		SDValue BuildUDIV(SDNode *N);
		SDValue BuildSREMPow2(SDNode *N);
		SDValue buildOptimizedSREM(SDValue N0, SDValue N1, SDNode *N);
SDValue BuildLogBase2(SDValue V, const SDLoc &DL);		SDValue BuildLogBase2(SDValue V, const SDLoc &DL);
SDValue BuildDivEstimate(SDValue N, SDValue Op, SDNodeFlags Flags);		SDValue BuildDivEstimate(SDValue N, SDValue Op, SDNodeFlags Flags);
SDValue buildRsqrtEstimate(SDValue Op, SDNodeFlags Flags);		SDValue buildRsqrtEstimate(SDValue Op, SDNodeFlags Flags);
SDValue buildSqrtEstimate(SDValue Op, SDNodeFlags Flags);		SDValue buildSqrtEstimate(SDValue Op, SDNodeFlags Flags);
SDValue buildSqrtEstimateImpl(SDValue Op, SDNodeFlags Flags, bool Recip);		SDValue buildSqrtEstimateImpl(SDValue Op, SDNodeFlags Flags, bool Recip);
SDValue buildSqrtNROneConst(SDValue Arg, SDValue Est, unsigned Iterations,		SDValue buildSqrtNROneConst(SDValue Arg, SDValue Est, unsigned Iterations,
SDNodeFlags Flags, bool Reciprocal);		SDNodeFlags Flags, bool Reciprocal);
SDValue buildSqrtNRTwoConst(SDValue Arg, SDValue Est, unsigned Iterations,		SDValue buildSqrtNRTwoConst(SDValue Arg, SDValue Est, unsigned Iterations,
▲ Show 20 Lines • Show All 3,735 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitSDIV(SDNode *N) {
AttributeList Attr = DAG.getMachineFunction().getFunction().getAttributes();		AttributeList Attr = DAG.getMachineFunction().getFunction().getAttributes();
if (!N1C \|\| TLI.isIntDivCheap(N->getValueType(0), Attr))		if (!N1C \|\| TLI.isIntDivCheap(N->getValueType(0), Attr))
if (SDValue DivRem = useDivRem(N))		if (SDValue DivRem = useDivRem(N))
return DivRem;		return DivRem;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitSDIVLike(SDValue N0, SDValue N1, SDNode *N) {		static bool isDivisorPowerOfTwo(SDValue Divisor) {
		efriedmaUnsubmitted Not Done Reply Inline Actions I'm not sure I like the modified API for visitSDIVLike? It's sort of weird that the return value is either the quotient or the remainder, depending on the value of BuildRem. And that BuildRem is both an input and an output. efriedma: I'm not sure I like the modified API for visitSDIVLike? It's sort of weird that the return…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions Yeah, this part is ugly. I will refactor this part today bcl5980: Yeah, this part is ugly. I will refactor this part today
SDLoc DL(N);
EVT VT = N->getValueType(0);
EVT CCVT = getSetCCResultType(VT);
unsigned BitWidth = VT.getScalarSizeInBits();

// Helper for determining whether a value is a power-2 constant scalar or a		// Helper for determining whether a value is a power-2 constant scalar or a
// vector of such elements.		// vector of such elements.
auto IsPowerOfTwo = [](ConstantSDNode *C) {		auto IsPowerOfTwo = [](ConstantSDNode *C) {
if (C->isZero() \|\| C->isOpaque())		if (C->isZero() \|\| C->isOpaque())
return false;		return false;
if (C->getAPIntValue().isPowerOf2())		if (C->getAPIntValue().isPowerOf2())
return true;		return true;
if (C->getAPIntValue().isNegatedPowerOf2())		if (C->getAPIntValue().isNegatedPowerOf2())
return true;		return true;
return false;		return false;
};		};

		return ISD::matchUnaryPredicate(Divisor, IsPowerOfTwo);
		}

		SDValue DAGCombiner::visitSDIVLike(SDValue N0, SDValue N1, SDNode *N) {
		SDLoc DL(N);
		EVT VT = N->getValueType(0);
		EVT CCVT = getSetCCResultType(VT);
		unsigned BitWidth = VT.getScalarSizeInBits();

// fold (sdiv X, pow2) -> simple ops after legalize		// fold (sdiv X, pow2) -> simple ops after legalize
// FIXME: We check for the exact bit here because the generic lowering gives		// FIXME: We check for the exact bit here because the generic lowering gives
// better results in that case. The target-specific lowering should learn how		// better results in that case. The target-specific lowering should learn how
// to handle exact sdivs efficiently.		// to handle exact sdivs efficiently.
if (!N->getFlags().hasExact() && ISD::matchUnaryPredicate(N1, IsPowerOfTwo)) {		if (!N->getFlags().hasExact() && isDivisorPowerOfTwo(N1)) {
// Target-specific implementation of sdiv x, pow2.		// Target-specific implementation of sdiv x, pow2.
if (SDValue Res = BuildSDIVPow2(N))		if (SDValue Res = BuildSDIVPow2(N))
return Res;		return Res;

// Create constants that are functions of the shift amount value.		// Create constants that are functions of the shift amount value.
EVT ShiftAmtTy = getShiftAmountTy(N0.getValueType());		EVT ShiftAmtTy = getShiftAmountTy(N0.getValueType());
SDValue Bits = DAG.getConstant(BitWidth, DL, ShiftAmtTy);		SDValue Bits = DAG.getConstant(BitWidth, DL, ShiftAmtTy);
SDValue C1 = DAG.getNode(ISD::CTTZ, DL, VT, N1);		SDValue C1 = DAG.getNode(ISD::CTTZ, DL, VT, N1);
▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitUDIVLike(SDValue N0, SDValue N1, SDNode *N) {
if (isConstantOrConstantVector(N1) &&		if (isConstantOrConstantVector(N1) &&
!TLI.isIntDivCheap(N->getValueType(0), Attr))		!TLI.isIntDivCheap(N->getValueType(0), Attr))
if (SDValue Op = BuildUDIV(N))		if (SDValue Op = BuildUDIV(N))
return Op;		return Op;

return SDValue();		return SDValue();
}		}

		SDValue DAGCombiner::buildOptimizedSREM(SDValue N0, SDValue N1, SDNode *N) {
		if (!N->getFlags().hasExact() && isDivisorPowerOfTwo(N1) &&
		!DAG.doesNodeExist(ISD::SDIV, N->getVTList(), {N0, N1})) {
		// Target-specific implementation of srem x, pow2.
		if (SDValue Res = BuildSREMPow2(N))
		return Res;
		}
		return SDValue();
		}

// handles ISD::SREM and ISD::UREM		// handles ISD::SREM and ISD::UREM
SDValue DAGCombiner::visitREM(SDNode *N) {		SDValue DAGCombiner::visitREM(SDNode *N) {
unsigned Opcode = N->getOpcode();		unsigned Opcode = N->getOpcode();
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT CCVT = getSetCCResultType(VT);		EVT CCVT = getSetCCResultType(VT);

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitREM(SDNode *N) {
// If X/C can be simplified by the division-by-constant logic, lower		// If X/C can be simplified by the division-by-constant logic, lower
// X%C to the equivalent of X-X/C*C.		// X%C to the equivalent of X-X/C*C.
// Reuse the SDIVLike/UDIVLike combines - to avoid mangling nodes, the		// Reuse the SDIVLike/UDIVLike combines - to avoid mangling nodes, the
// speculative DIV must not cause a DIVREM conversion. We guard against this		// speculative DIV must not cause a DIVREM conversion. We guard against this
// by skipping the simplification if isIntDivCheap(). When div is not cheap,		// by skipping the simplification if isIntDivCheap(). When div is not cheap,
// combine will not return a DIVREM. Regardless, checking cheapness here		// combine will not return a DIVREM. Regardless, checking cheapness here
// makes sense since the simplification results in fatter code.		// makes sense since the simplification results in fatter code.
if (DAG.isKnownNeverZero(N1) && !TLI.isIntDivCheap(VT, Attr)) {		if (DAG.isKnownNeverZero(N1) && !TLI.isIntDivCheap(VT, Attr)) {
		if (isSigned) {
		// check if we can build faster implementation for srem
		SDValue OptimizedRem = buildOptimizedSREM(N0, N1, N);
		if (OptimizedRem.getNode())
		return OptimizedRem;
		}
		efriedmaUnsubmitted Done Reply Inline Actions Do we need to do something special for something like the following, to match the existing handling for div/rem pairs? I guess it's only a couple instructions different either way, but maybe worth considering. define void @sdivrem(i32 %x, i32* %ap, i32* %bp) { %a = sdiv i32 %x, 4 %b = srem i32 %x, 4 store i32 %a, i32* %ap store i32 %b, i32* %bp ret void } efriedma: Do we need to do something special for something like the following, to match the existing…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions For now I have no AArch64 device to test, do we have some tools to verify instructions similar to alive2.llvm.org/ce/ on AArch64? If no maybe I should use qemu to build a test environment. bcl5980: For now I have no AArch64 device to test, do we have some tools to verify instructions similar…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions I use IR to write the logic. Is this the behavior you want? https://alive2.llvm.org/ce/z/hGZBBk bcl5980: I use IR to write the logic. Is this the behavior you want? https://alive2.llvm.org/ce/z/hGZBBk
		bcl5980AuthorUnsubmitted Done Reply Inline Actions Based on the IR to asm by llc, it looks sdiv/srem pair can't get benifit from this pattern. https://godbolt.org/z/Pns5YvYfY And it looks we lack dag combine from neg + cmp to negs based on the test. bcl5980: Based on the IR to asm by llc, it looks sdiv/srem pair can't get benifit from this pattern.
		efriedmaUnsubmitted Done Reply Inline Actions I just meant that I wasn't sure if we would end up with the same code sequence as we would without this patch; I wasn't suggesting any specific transform. (I guess what happens might be sensitive to whether we visit the SDIV or the SREM first...) efriedma: I just meant that I wasn't sure if we would end up with the same code sequence as we would…
		bcl5980AuthorUnsubmitted Done Reply Inline Actions I check the !DAG.doesNodeExist(ISD::SDIV, N->getVTList(), {N0, N1}) at line 4412, SDIV / SREM paire won't go into the srem only path bcl5980: I check the !DAG.doesNodeExist(ISD::SDIV, N->getVTList(), {N0, N1}) at line 4412, SDIV / SREM…
		efriedmaUnsubmitted Done Reply Inline Actions Oh, I see; that's fine, then. efriedma: Oh, I see; that's fine, then.
SDValue OptimizedDiv =		SDValue OptimizedDiv =
isSigned ? visitSDIVLike(N0, N1, N) : visitUDIVLike(N0, N1, N);		isSigned ? visitSDIVLike(N0, N1, N) : visitUDIVLike(N0, N1, N);
if (OptimizedDiv.getNode() && OptimizedDiv.getNode() != N) {		if (OptimizedDiv.getNode() && OptimizedDiv.getNode() != N) {
// If the equivalent Div node also exists, update its users.		// If the equivalent Div node also exists, update its users.
unsigned DivOpcode = isSigned ? ISD::SDIV : ISD::UDIV;		unsigned DivOpcode = isSigned ? ISD::SDIV : ISD::UDIV;
if (SDNode *DivNode = DAG.getNodeIfExists(DivOpcode, N->getVTList(),		if (SDNode *DivNode = DAG.getNodeIfExists(DivOpcode, N->getVTList(),
{ N0, N1 }))		{ N0, N1 }))
CombineTo(DivNode, OptimizedDiv);		CombineTo(DivNode, OptimizedDiv);
▲ Show 20 Lines • Show All 19,302 Lines • ▼ Show 20 Lines	if (SDValue S = TLI.BuildUDIV(N, DAG, LegalOperations, Built)) {
for (SDNode *N : Built)		for (SDNode *N : Built)
AddToWorklist(N);		AddToWorklist(N);
return S;		return S;
}		}

return SDValue();		return SDValue();
}		}

		/// Given an ISD::SREM node expressing a remainder by constant power of 2,
		/// return a DAG expression that will generate the same value.
		efriedmaUnsubmitted Done Reply Inline Actions Fix this comment? Not sure how you compute srem "by right shifting" efriedma: Fix this comment? Not sure how you compute srem "by right shifting"
		SDValue DAGCombiner::BuildSREMPow2(SDNode *N) {
		ConstantSDNode *C = isConstOrConstSplat(N->getOperand(1));
		if (!C)
		return SDValue();

		// Avoid division by zero.
		if (C->isZero())
		return SDValue();

		SmallVector<SDNode *, 8> Built;
		if (SDValue S = TLI.BuildSREMPow2(N, C->getAPIntValue(), DAG, Built)) {
		for (SDNode *N : Built)
		AddToWorklist(N);
		return S;
		}

		return SDValue();
		}

/// Determines the LogBase2 value for a non-null input value using the		/// Determines the LogBase2 value for a non-null input value using the
/// transform: LogBase2(V) = (EltBits - 1) - ctlz(V).		/// transform: LogBase2(V) = (EltBits - 1) - ctlz(V).
SDValue DAGCombiner::BuildLogBase2(SDValue V, const SDLoc &DL) {		SDValue DAGCombiner::BuildLogBase2(SDValue V, const SDLoc &DL) {
EVT VT = V.getValueType();		EVT VT = V.getValueType();
SDValue Ctlz = DAG.getNode(ISD::CTLZ, DL, VT, V);		SDValue Ctlz = DAG.getNode(ISD::CTLZ, DL, VT, V);
SDValue Base = DAG.getConstant(VT.getScalarSizeInBits() - 1, DL, VT);		SDValue Base = DAG.getConstant(VT.getScalarSizeInBits() - 1, DL, VT);
SDValue LogBase2 = DAG.getNode(ISD::SUB, DL, VT, Base, Ctlz);		SDValue LogBase2 = DAG.getNode(ISD::SUB, DL, VT, Base, Ctlz);
return LogBase2;		return LogBase2;
▲ Show 20 Lines • Show All 645 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,554 Lines • ▼ Show 20 Lines	SDValue TargetLowering::BuildSDIVPow2(SDNode *N, const APInt &Divisor,
SmallVectorImpl<SDNode *> &Created) const {		SmallVectorImpl<SDNode *> &Created) const {
AttributeList Attr = DAG.getMachineFunction().getFunction().getAttributes();		AttributeList Attr = DAG.getMachineFunction().getFunction().getAttributes();
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
if (TLI.isIntDivCheap(N->getValueType(0), Attr))		if (TLI.isIntDivCheap(N->getValueType(0), Attr))
return SDValue(N, 0); // Lower SDIV as SDIV		return SDValue(N, 0); // Lower SDIV as SDIV
return SDValue();		return SDValue();
}		}

		SDValue
		TargetLowering::BuildSREMPow2(SDNode *N, const APInt &Divisor,
		SelectionDAG &DAG,
		SmallVectorImpl<SDNode *> &Created) const {
		AttributeList Attr = DAG.getMachineFunction().getFunction().getAttributes();
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		if (TLI.isIntDivCheap(N->getValueType(0), Attr))
		return SDValue(N, 0); // Lower SREM as SREM
		return SDValue();
		}

/// Given an ISD::SDIV node expressing a divide by constant,		/// Given an ISD::SDIV node expressing a divide by constant,
/// return a DAG expression to select that will generate the same value by		/// return a DAG expression to select that will generate the same value by
/// multiplying by a magic number.		/// multiplying by a magic number.
/// Ref: "Hacker's Delight" or "The PowerPC Compiler Writer's Guide".		/// Ref: "Hacker's Delight" or "The PowerPC Compiler Writer's Guide".
SDValue TargetLowering::BuildSDIV(SDNode *N, SelectionDAG &DAG,		SDValue TargetLowering::BuildSDIV(SDNode *N, SelectionDAG &DAG,
bool IsAfterLegalization,		bool IsAfterLegalization,
SmallVectorImpl<SDNode *> &Created) const {		SmallVectorImpl<SDNode *> &Created) const {
SDLoc dl(N);		SDLoc dl(N);
▲ Show 20 Lines • Show All 3,859 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 1,037 Lines • ▼ Show 20 Lines	private:
SDValue LowerFixedLengthFPRoundToSVE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFixedLengthFPRoundToSVE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFixedLengthIntToFPToSVE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFixedLengthIntToFPToSVE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFixedLengthFPToIntToSVE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFixedLengthFPToIntToSVE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFixedLengthVECTOR_SHUFFLEToSVE(SDValue Op,		SDValue LowerFixedLengthVECTOR_SHUFFLEToSVE(SDValue Op,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;

SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,		SDValue BuildSDIVPow2(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,
SmallVectorImpl<SDNode *> &Created) const override;		SmallVectorImpl<SDNode *> &Created) const override;
		SDValue BuildSREMPow2(SDNode *N, const APInt &Divisor, SelectionDAG &DAG,
		SmallVectorImpl<SDNode *> &Created) const override;
SDValue getSqrtEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,		SDValue getSqrtEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
int &ExtraSteps, bool &UseOneConst,		int &ExtraSteps, bool &UseOneConst,
bool Reciprocal) const override;		bool Reciprocal) const override;
SDValue getRecipEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,		SDValue getRecipEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
int &ExtraSteps) const override;		int &ExtraSteps) const override;
SDValue getSqrtInputTest(SDValue Operand, SelectionDAG &DAG,		SDValue getSqrtInputTest(SDValue Operand, SelectionDAG &DAG,
const DenormalMode &Mode) const override;		const DenormalMode &Mode) const override;
SDValue getSqrtResultForDenormInput(SDValue Operand,		SDValue getSqrtResultForDenormInput(SDValue Operand,
▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 13,561 Lines • ▼ Show 20 Lines	AArch64TargetLowering::BuildSDIVPow2(SDNode *N, const APInt &Divisor,
// negate the result.		// negate the result.
if (Divisor.isNonNegative())		if (Divisor.isNonNegative())
return SRA;		return SRA;

Created.push_back(SRA.getNode());		Created.push_back(SRA.getNode());
return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), SRA);		return DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), SRA);
}		}

		SDValue
		AArch64TargetLowering::BuildSREMPow2(SDNode *N, const APInt &Divisor,
		SelectionDAG &DAG,
		SmallVectorImpl<SDNode *> &Created) const {
		AttributeList Attr = DAG.getMachineFunction().getFunction().getAttributes();
		if (isIntDivCheap(N->getValueType(0), Attr))
		return SDValue(N, 0); // Lower SREM as SREM

		EVT VT = N->getValueType(0);

		// For scalable and fixed types, mark them as cheap so we can handle it much
		// later. This allows us to handle larger than legal types.
		if (VT.isScalableVector() \|\| Subtarget->useSVEForFixedLengthVectors())
		return SDValue(N, 0);

		// fold (srem X, pow2)
		if ((VT != MVT::i32 && VT != MVT::i64) \|\|
		!(Divisor.isPowerOf2() \|\| Divisor.isNegatedPowerOf2()))
		return SDValue();

		unsigned Lg2 = Divisor.countTrailingZeros();
		if (Lg2 == 0)
		efriedmaUnsubmitted Done Reply Inline Actions Probably worth a comment explaining this. Lg2 == 0 folds to a constant. Not obvious why we're special-casing `Lg2 == (VT.getScalarSizeInBits() - 1)`; does this sequence not work somehow? efriedma: Probably worth a comment explaining this. Lg2 == 0 folds to a constant. Not obvious why we're…
		return SDValue();

		SDLoc DL(N);
		SDValue N0 = N->getOperand(0);
		SDValue Pow2MinusOne = DAG.getConstant((1ULL << Lg2) - 1, DL, VT);
		SDValue Zero = DAG.getConstant(0, DL, VT);
		SDValue CCVal, CSNeg;
		if (Lg2 == 1) {
		SDValue Cmp = getAArch64Cmp(N0, Zero, ISD::SETGE, CCVal, DAG, DL);
		SDValue And = DAG.getNode(ISD::AND, DL, VT, N0, Pow2MinusOne);
		CSNeg = DAG.getNode(AArch64ISD::CSNEG, DL, VT, And, And, CCVal, Cmp);

		Created.push_back(Cmp.getNode());
		Created.push_back(And.getNode());
		} else {
		SDValue CCVal = DAG.getConstant(AArch64CC::MI, DL, MVT_CC);
		SDVTList VTs = DAG.getVTList(VT, MVT::i32);

		SDValue Negs = DAG.getNode(AArch64ISD::SUBS, DL, VTs, Zero, N0);
		SDValue AndPos = DAG.getNode(ISD::AND, DL, VT, N0, Pow2MinusOne);
		SDValue AndNeg = DAG.getNode(ISD::AND, DL, VT, Negs, Pow2MinusOne);
		CSNeg = DAG.getNode(AArch64ISD::CSNEG, DL, VT, AndPos, AndNeg, CCVal,
		Negs.getValue(1));

		Created.push_back(Negs.getNode());
		Created.push_back(AndPos.getNode());
		Created.push_back(AndNeg.getNode());
		}

		return CSNeg;
		}

static bool IsSVECntIntrinsic(SDValue S) {		static bool IsSVECntIntrinsic(SDValue S) {
switch(getIntrinsicID(S.getNode())) {		switch(getIntrinsicID(S.getNode())) {
default:		default:
break;		break;
case Intrinsic::aarch64_sve_cntb:		case Intrinsic::aarch64_sve_cntb:
case Intrinsic::aarch64_sve_cnth:		case Intrinsic::aarch64_sve_cnth:
case Intrinsic::aarch64_sve_cntw:		case Intrinsic::aarch64_sve_cntw:
case Intrinsic::aarch64_sve_cntd:		case Intrinsic::aarch64_sve_cntd:
▲ Show 20 Lines • Show All 7,354 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/srem-pow2.ll

	Show All 37 Lines
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = srem i16 %x, 2			%1 = srem i16 %x, 2
	ret i16 %1			ret i16 %1
	}			}

	define i32 @fold_srem_2_i64(i32 %x) {			define i32 @fold_srem_2_i64(i32 %x) {
	; CHECK-LABEL: fold_srem_2_i64:			; CHECK-LABEL: fold_srem_2_i64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
				; CHECK-NEXT: and w8, w0, #0x1
	; CHECK-NEXT: cmp w0, #0			; CHECK-NEXT: cmp w0, #0
	; CHECK-NEXT: cinc w8, w0, lt			; CHECK-NEXT: cneg w0, w8, lt
	; CHECK-NEXT: and w8, w8, #0xfffffffe
	; CHECK-NEXT: sub w0, w0, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = srem i32 %x, 2			%1 = srem i32 %x, 2
	ret i32 %1			ret i32 %1
	}			}

	define i64 @fold_srem_2_i32(i64 %x) {			define i64 @fold_srem_2_i32(i64 %x) {
	; CHECK-LABEL: fold_srem_2_i32:			; CHECK-LABEL: fold_srem_2_i32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
				; CHECK-NEXT: and x8, x0, #0x1
	; CHECK-NEXT: cmp x0, #0			; CHECK-NEXT: cmp x0, #0
	; CHECK-NEXT: cinc x8, x0, lt			; CHECK-NEXT: cneg x0, x8, lt
	; CHECK-NEXT: and x8, x8, #0xfffffffffffffffe
	; CHECK-NEXT: sub x0, x0, x8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = srem i64 %x, 2			%1 = srem i64 %x, 2
	ret i64 %1			ret i64 %1
	}			}

	define i16 @fold_srem_pow2_i16(i16 %x) {			define i16 @fold_srem_pow2_i16(i16 %x) {
	; CHECK-LABEL: fold_srem_pow2_i16:			; CHECK-LABEL: fold_srem_pow2_i16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: sxth w8, w0			; CHECK-NEXT: sxth w8, w0
	; CHECK-NEXT: ubfx w8, w8, #25, #6			; CHECK-NEXT: ubfx w8, w8, #25, #6
	; CHECK-NEXT: add w8, w0, w8			; CHECK-NEXT: add w8, w0, w8
	; CHECK-NEXT: and w8, w8, #0xffffffc0			; CHECK-NEXT: and w8, w8, #0xffffffc0
	; CHECK-NEXT: sub w0, w0, w8			; CHECK-NEXT: sub w0, w0, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = srem i16 %x, 64			%1 = srem i16 %x, 64
	ret i16 %1			ret i16 %1
	}			}

	define i32 @fold_srem_pow2_i32(i32 %x) {			define i32 @fold_srem_pow2_i32(i32 %x) {
	; CHECK-LABEL: fold_srem_pow2_i32:			; CHECK-LABEL: fold_srem_pow2_i32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: add w8, w0, #63			; CHECK-NEXT: negs w8, w0
	; CHECK-NEXT: cmp w0, #0			; CHECK-NEXT: and w9, w0, #0x3f
	; CHECK-NEXT: csel w8, w8, w0, lt			; CHECK-NEXT: and w8, w8, #0x3f
	; CHECK-NEXT: and w8, w8, #0xffffffc0			; CHECK-NEXT: csneg w0, w9, w8, mi
	; CHECK-NEXT: sub w0, w0, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = srem i32 %x, 64			%1 = srem i32 %x, 64
	ret i32 %1			ret i32 %1
	}			}

	define i64 @fold_srem_pow2_i64(i64 %x) {			define i64 @fold_srem_pow2_i64(i64 %x) {
	; CHECK-LABEL: fold_srem_pow2_i64:			; CHECK-LABEL: fold_srem_pow2_i64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: add x8, x0, #63			; CHECK-NEXT: negs x8, x0
	; CHECK-NEXT: cmp x0, #0			; CHECK-NEXT: and x9, x0, #0x3f
	; CHECK-NEXT: csel x8, x8, x0, lt			; CHECK-NEXT: and x8, x8, #0x3f
	; CHECK-NEXT: and x8, x8, #0xffffffffffffffc0			; CHECK-NEXT: csneg x0, x9, x8, mi
	; CHECK-NEXT: sub x0, x0, x8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = srem i64 %x, 64			%1 = srem i64 %x, 64
	ret i64 %1			ret i64 %1
	}			}

	define i16 @fold_srem_smax_i16(i16 %x) {			define i16 @fold_srem_smax_i16(i16 %x) {
	; CHECK-LABEL: fold_srem_smax_i16:			; CHECK-LABEL: fold_srem_smax_i16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: sxth w8, w0			; CHECK-NEXT: sxth w8, w0
	; CHECK-NEXT: ubfx w8, w8, #16, #15			; CHECK-NEXT: ubfx w8, w8, #16, #15
	; CHECK-NEXT: add w8, w0, w8			; CHECK-NEXT: add w8, w0, w8
	; CHECK-NEXT: and w8, w8, #0xffff8000			; CHECK-NEXT: and w8, w8, #0xffff8000
	; CHECK-NEXT: add w0, w0, w8			; CHECK-NEXT: add w0, w0, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = srem i16 %x, 32768			%1 = srem i16 %x, 32768
	ret i16 %1			ret i16 %1
	}			}

	define i32 @fold_srem_smax_i32(i32 %x) {			define i32 @fold_srem_smax_i32(i32 %x) {
	; CHECK-LABEL: fold_srem_smax_i32:			; CHECK-LABEL: fold_srem_smax_i32:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov w8, #2147483647			; CHECK-NEXT: negs w8, w0
	; CHECK-NEXT: cmp w0, #0			; CHECK-NEXT: and w9, w0, #0x7fffffff
	; CHECK-NEXT: add w8, w0, w8			; CHECK-NEXT: and w8, w8, #0x7fffffff
	; CHECK-NEXT: csel w8, w8, w0, lt			; CHECK-NEXT: csneg w0, w9, w8, mi
	; CHECK-NEXT: and w8, w8, #0x80000000
	; CHECK-NEXT: add w0, w0, w8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = srem i32 %x, 2147483648			%1 = srem i32 %x, 2147483648
	ret i32 %1			ret i32 %1
	}			}

	define i64 @fold_srem_smax_i64(i64 %x) {			define i64 @fold_srem_smax_i64(i64 %x) {
	; CHECK-LABEL: fold_srem_smax_i64:			; CHECK-LABEL: fold_srem_smax_i64:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: mov x8, #9223372036854775807			; CHECK-NEXT: negs x8, x0
	; CHECK-NEXT: cmp x0, #0			; CHECK-NEXT: and x9, x0, #0x7fffffffffffffff
	; CHECK-NEXT: add x8, x0, x8			; CHECK-NEXT: and x8, x8, #0x7fffffffffffffff
	; CHECK-NEXT: csel x8, x8, x0, lt			; CHECK-NEXT: csneg x0, x9, x8, mi
	; CHECK-NEXT: and x8, x8, #0x8000000000000000
	; CHECK-NEXT: add x0, x0, x8
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = srem i64 %x, -9223372036854775808			%1 = srem i64 %x, -9223372036854775808
	ret i64 %1			ret i64 %1
	}			}

llvm/test/CodeGen/AArch64/srem-seteq.ll

Show First 20 Lines • Show All 228 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%ret = zext i1 %cmp to i32		%ret = zext i1 %cmp to i32
ret i32 %ret		ret i32 %ret
}		}

; We can lower remainder of division by powers of two much better elsewhere.		; We can lower remainder of division by powers of two much better elsewhere.
define i32 @test_srem_pow2(i32 %X) nounwind {		define i32 @test_srem_pow2(i32 %X) nounwind {
; CHECK-LABEL: test_srem_pow2:		; CHECK-LABEL: test_srem_pow2:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: add w8, w0, #15		; CHECK-NEXT: negs w8, w0
; CHECK-NEXT: cmp w0, #0		; CHECK-NEXT: and w9, w0, #0xf
; CHECK-NEXT: csel w8, w8, w0, lt		; CHECK-NEXT: and w8, w8, #0xf
; CHECK-NEXT: and w8, w8, #0xfffffff0		; CHECK-NEXT: csneg w8, w9, w8, mi
; CHECK-NEXT: cmp w0, w8		; CHECK-NEXT: cmp w8, #0
; CHECK-NEXT: cset w0, eq		; CHECK-NEXT: cset w0, eq
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%srem = srem i32 %X, 16		%srem = srem i32 %X, 16
%cmp = icmp eq i32 %srem, 0		%cmp = icmp eq i32 %srem, 0
%ret = zext i1 %cmp to i32		%ret = zext i1 %cmp to i32
ret i32 %ret		ret i32 %ret
}		}

; The fold is only valid for positive divisors, and we can't negate INT_MIN.		; The fold is only valid for positive divisors, and we can't negate INT_MIN.
define i32 @test_srem_int_min(i32 %X) nounwind {		define i32 @test_srem_int_min(i32 %X) nounwind {
; CHECK-LABEL: test_srem_int_min:		; CHECK-LABEL: test_srem_int_min:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #2147483647		; CHECK-NEXT: negs w8, w0
; CHECK-NEXT: cmp w0, #0		; CHECK-NEXT: and w9, w0, #0x7fffffff
; CHECK-NEXT: add w8, w0, w8		; CHECK-NEXT: and w8, w8, #0x7fffffff
; CHECK-NEXT: csel w8, w8, w0, lt		; CHECK-NEXT: csneg w8, w9, w8, mi
; CHECK-NEXT: and w8, w8, #0x80000000		; CHECK-NEXT: cmp w8, #0
; CHECK-NEXT: cmn w0, w8
; CHECK-NEXT: cset w0, eq		; CHECK-NEXT: cset w0, eq
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%srem = srem i32 %X, 2147483648		%srem = srem i32 %X, 2147483648
%cmp = icmp eq i32 %srem, 0		%cmp = icmp eq i32 %srem, 0
%ret = zext i1 %cmp to i32		%ret = zext i1 %cmp to i32
ret i32 %ret		ret i32 %ret
}		}

Show All 11 Lines

llvm/test/CodeGen/AArch64/srem-vector-lkk.ll

	Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
	; Don't fold for divisors that are a power of two.			; Don't fold for divisors that are a power of two.
	define <4 x i16> @dont_fold_srem_power_of_two(<4 x i16> %x) {			define <4 x i16> @dont_fold_srem_power_of_two(<4 x i16> %x) {
	; CHECK-LABEL: dont_fold_srem_power_of_two:			; CHECK-LABEL: dont_fold_srem_power_of_two:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0			; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
	; CHECK-NEXT: smov w9, v0.h[1]			; CHECK-NEXT: smov w9, v0.h[1]
	; CHECK-NEXT: smov w10, v0.h[0]			; CHECK-NEXT: smov w10, v0.h[0]
	; CHECK-NEXT: mov w8, #37253			; CHECK-NEXT: mov w8, #37253
				; CHECK-NEXT: smov w12, v0.h[2]
	; CHECK-NEXT: movk w8, #44150, lsl #16			; CHECK-NEXT: movk w8, #44150, lsl #16
	; CHECK-NEXT: add w11, w9, #31			; CHECK-NEXT: negs w11, w9
	; CHECK-NEXT: cmp w9, #0			; CHECK-NEXT: and w9, w9, #0x1f
	; CHECK-NEXT: add w12, w10, #63			; CHECK-NEXT: and w11, w11, #0x1f
	; CHECK-NEXT: csel w11, w11, w9, lt			; CHECK-NEXT: csneg w9, w9, w11, mi
	; CHECK-NEXT: cmp w10, #0			; CHECK-NEXT: negs w11, w10
	; CHECK-NEXT: and w11, w11, #0xffffffe0			; CHECK-NEXT: and w10, w10, #0x3f
	; CHECK-NEXT: csel w12, w12, w10, lt			; CHECK-NEXT: and w11, w11, #0x3f
	; CHECK-NEXT: sub w9, w9, w11			; CHECK-NEXT: csneg w10, w10, w11, mi
	; CHECK-NEXT: and w12, w12, #0xffffffc0			; CHECK-NEXT: smov w11, v0.h[3]
	; CHECK-NEXT: sub w10, w10, w12			; CHECK-NEXT: fmov s0, w10
	; CHECK-NEXT: smov w12, v0.h[3]			; CHECK-NEXT: negs w10, w12
	; CHECK-NEXT: fmov s1, w10			; CHECK-NEXT: smull x8, w11, w8
	; CHECK-NEXT: smov w10, v0.h[2]			; CHECK-NEXT: and w10, w10, #0x7
	; CHECK-NEXT: smull x8, w12, w8
	; CHECK-NEXT: mov v1.h[1], w9
	; CHECK-NEXT: lsr x8, x8, #32			; CHECK-NEXT: lsr x8, x8, #32
	; CHECK-NEXT: add w9, w10, #7			; CHECK-NEXT: mov v0.h[1], w9
	; CHECK-NEXT: cmp w10, #0			; CHECK-NEXT: and w9, w12, #0x7
	; CHECK-NEXT: csel w9, w9, w10, lt			; CHECK-NEXT: add w8, w8, w11
	; CHECK-NEXT: add w8, w8, w12			; CHECK-NEXT: csneg w9, w9, w10, mi
	; CHECK-NEXT: and w9, w9, #0xfffffff8
	; CHECK-NEXT: sub w9, w10, w9
	; CHECK-NEXT: asr w10, w8, #6			; CHECK-NEXT: asr w10, w8, #6
	; CHECK-NEXT: add w8, w10, w8, lsr #31			; CHECK-NEXT: add w8, w10, w8, lsr #31
	; CHECK-NEXT: mov w10, #95			; CHECK-NEXT: mov w10, #95
	; CHECK-NEXT: mov v1.h[2], w9			; CHECK-NEXT: mov v0.h[2], w9
	; CHECK-NEXT: msub w8, w8, w10, w12			; CHECK-NEXT: msub w8, w8, w10, w11
	; CHECK-NEXT: mov v1.h[3], w8			; CHECK-NEXT: mov v0.h[3], w8
	; CHECK-NEXT: fmov d0, d1			; CHECK-NEXT: // kill: def $d0 killed $d0 killed $q0
				efriedmaUnsubmitted Not Done Reply Inline Actions I think this somehow ended up shorter with D122829? That might be fine, though; it looks like this version avoids the expensive add-with-shift instructions. efriedma: I think this somehow ended up shorter with D122829? That might be fine, though; it looks like…
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = srem <4 x i16> %x, <i16 64, i16 32, i16 8, i16 95>			%1 = srem <4 x i16> %x, <i16 64, i16 32, i16 8, i16 95>
	ret <4 x i16> %1			ret <4 x i16> %1
	}			}

	; Don't fold if the divisor is one.			; Don't fold if the divisor is one.
	define <4 x i16> @dont_fold_srem_one(<4 x i16> %x) {			define <4 x i16> @dont_fold_srem_one(<4 x i16> %x) {
	; CHECK-LABEL: dont_fold_srem_one:			; CHECK-LABEL: dont_fold_srem_one:
	Show All 40 Lines

	; Don't fold if the divisor is 2^15.			; Don't fold if the divisor is 2^15.
	define <4 x i16> @dont_fold_srem_i16_smax(<4 x i16> %x) {			define <4 x i16> @dont_fold_srem_i16_smax(<4 x i16> %x) {
	; CHECK-LABEL: dont_fold_srem_i16_smax:			; CHECK-LABEL: dont_fold_srem_i16_smax:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0			; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
	; CHECK-NEXT: smov w8, v0.h[2]			; CHECK-NEXT: smov w8, v0.h[2]
	; CHECK-NEXT: mov w9, #17097			; CHECK-NEXT: mov w9, #17097
	; CHECK-NEXT: smov w10, v0.h[1]
	; CHECK-NEXT: movk w9, #45590, lsl #16			; CHECK-NEXT: movk w9, #45590, lsl #16
	; CHECK-NEXT: mov w11, #32767			; CHECK-NEXT: smov w10, v0.h[1]
	; CHECK-NEXT: smov w12, v0.h[3]			; CHECK-NEXT: smov w12, v0.h[3]
	; CHECK-NEXT: movi d1, #0000000000000000			; CHECK-NEXT: movi d1, #0000000000000000
				; CHECK-NEXT: mov w11, #23
	; CHECK-NEXT: smull x9, w8, w9			; CHECK-NEXT: smull x9, w8, w9
	; CHECK-NEXT: add w11, w10, w11
	; CHECK-NEXT: cmp w10, #0
	; CHECK-NEXT: lsr x9, x9, #32			; CHECK-NEXT: lsr x9, x9, #32
	; CHECK-NEXT: csel w11, w11, w10, lt
	; CHECK-NEXT: add w9, w9, w8			; CHECK-NEXT: add w9, w9, w8
	; CHECK-NEXT: and w11, w11, #0xffff8000
	; CHECK-NEXT: asr w13, w9, #4			; CHECK-NEXT: asr w13, w9, #4
	; CHECK-NEXT: sub w10, w10, w11
	; CHECK-NEXT: mov w11, #47143
	; CHECK-NEXT: add w9, w13, w9, lsr #31			; CHECK-NEXT: add w9, w13, w9, lsr #31
	; CHECK-NEXT: mov w13, #23			; CHECK-NEXT: negs w13, w10
	; CHECK-NEXT: movk w11, #24749, lsl #16			; CHECK-NEXT: and w10, w10, #0x7fff
				; CHECK-NEXT: and w13, w13, #0x7fff
				; CHECK-NEXT: csneg w10, w10, w13, mi
				; CHECK-NEXT: mov w13, #47143
				; CHECK-NEXT: movk w13, #24749, lsl #16
				; CHECK-NEXT: msub w8, w9, w11, w8
				; CHECK-NEXT: smull x9, w12, w13
	; CHECK-NEXT: mov v1.h[1], w10			; CHECK-NEXT: mov v1.h[1], w10
	; CHECK-NEXT: msub w8, w9, w13, w8
	; CHECK-NEXT: smull x9, w12, w11
	; CHECK-NEXT: lsr x10, x9, #63			; CHECK-NEXT: lsr x10, x9, #63
	; CHECK-NEXT: asr x9, x9, #43			; CHECK-NEXT: asr x9, x9, #43
	; CHECK-NEXT: add w9, w9, w10			; CHECK-NEXT: add w9, w9, w10
	; CHECK-NEXT: mov w10, #5423			; CHECK-NEXT: mov w10, #5423
	; CHECK-NEXT: mov v1.h[2], w8			; CHECK-NEXT: mov v1.h[2], w8
	; CHECK-NEXT: msub w8, w9, w10, w12			; CHECK-NEXT: msub w8, w9, w10, w12
	; CHECK-NEXT: mov v1.h[3], w8			; CHECK-NEXT: mov v1.h[3], w8
	; CHECK-NEXT: fmov d0, d1			; CHECK-NEXT: fmov d0, d1
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SelectionDAG] Add target-specific implementation of sremClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 423440

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/srem-pow2.ll

llvm/test/CodeGen/AArch64/srem-seteq.ll

llvm/test/CodeGen/AArch64/srem-vector-lkk.ll

[AArch64][SelectionDAG] Add target-specific implementation of srem
ClosedPublic