This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
1/1
TargetLowering.h
-
lib/
-
CodeGen/
-
CodeGenPrepare.cpp
-
SelectionDAG/
-
LegalizeIntegerTypes.cpp
-
LegalizeTypes.h
1
SelectionDAG.cpp
-
TargetLowering.cpp
-
Target/
-
AArch64/
1/1
AArch64ISelLowering.h
-
RISCV/
-
RISCVISelLowering.h
-
RISCVISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
arm64-vshuffle.ll
-
arm64_32-atomics.ll
-
cmpxchg-idioms.ll
-
dag-numsignbits.ll
1
fast-isel-cmp-vec.ll
-
funnel-shift.ll
-
reduce-and.ll
-
redundant-copy-elim-empty-mbb.ll
-
statepoint-call-lowering.ll
1
sve-vector-splat.ll
-
unfold-masked-merge-vector-variablemask.ll
-
vecreduce-and-legalization.ll

Differential D114357

[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants
ClosedPublic

Authored by david-arm on Nov 22 2021, 3:57 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
CarolineConcatto
peterwaller-arm
RKSimon
kmclaughlin
Esme

Commits

rG197f3c0deb76: [CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative…
rG31009f0b5afb: [CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative…

Summary

When we know the value we're extending is a negative constant then it
makes sense to use SIGN_EXTEND because this may improve code quality in
some cases, particularly when doing a constant splat of an unpacked vector
type. For example, for SVE when splatting the value -1 into all elements
of a vector of type <vscale x 2 x i32> the element type will get promoted
from i32 -> i64. In this case we want the splat value to sign-extend from
(i32 -1) -> (i64 -1), whereas currently it zero-extends from
(i32 -1) -> (i64 0xFFFFFFFF). Sign-extending the constant means we can use
a single mov immediate instruction.

New tests added here:

CodeGen/AArch64/sve-vector-splat.ll

I believe we see some code quality improvements in these existing
tests too:

CodeGen/AArch64/dag-numsignbits.ll
CodeGen/AArch64/reduce-and.ll
CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll

The apparent regressions in CodeGen/AArch64/fast-isel-cmp-vec.ll only
occur because the test disables codegen prepare and branch folding.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

david-arm created this revision.Nov 22 2021, 3:57 AM

Herald added subscribers: ctetreau, steven.zhang, pengfei and 3 others. · View Herald TranscriptNov 22 2021, 3:57 AM

david-arm requested review of this revision.Nov 22 2021, 3:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2021, 3:57 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B135389: Diff 388865.Nov 22 2021, 3:57 AM

Couple of discussion points. I can see the rationale, but I wonder about some of the test changes and whether this could be revealing a latent bug.

llvm/test/CodeGen/X86/vector-fshl-512.ll
1107 ↗	(On Diff #388865)	Can anyone comment if the sign extensions in these constants are NFC?
llvm/test/CodeGen/X86/vector-shift-shl-512.ll
334 ↗	(On Diff #388865)	Can anyone comment if these sll -> mul changes are expected & harmless?

lebedev.ri added a subscriber: lebedev.ri.Nov 24 2021, 2:01 AM

lebedev.ri added inline comments.

llvm/test/CodeGen/X86/vector-shift-shl-512.ll
334 ↗	(On Diff #388865)	These look like regressions to me.

david-arm added inline comments.Nov 24 2021, 2:22 AM

llvm/test/CodeGen/X86/vector-shift-shl-512.ll
334 ↗	(On Diff #388865)	Hi @lebedev.ri, thanks for providing input here. Do you know if this means that there is a latent bug in the code where we should be explicitly using zero-extend here? I'm worried about cases where code is relying upon getAnyExtOrTrunc zero-extending.

RKSimon added inline comments.Nov 24 2021, 5:35 AM

llvm/test/CodeGen/X86/vector-shift-shl-512.ll
334 ↗	(On Diff #388865)	I'm not certain, but vXi8 shl by constant will fold to vXi8 multiplies by (pow2) constant, and will then be extended to vXi16 to make use of PMULLW - the upper bits aren't demanded so were any-extended, which was treated as a zero-extension during constant folding which preserved the pow2 nature which allowed it to be lowered back to a vXi16 shl. My guess is that now that they are sign-extended, the pow2 isn't seen any more.

craig.topper added a subscriber: craig.topper.Nov 24 2021, 8:47 AM

craig.topper added inline comments.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
1319	There's no guarantee that the caller would use getAnyExtOrTrunc. I think this should be in the constant folding for getNode(ISD::ANY_EXTEND)

craig.topper added inline comments.Nov 24 2021, 8:51 AM

llvm/test/CodeGen/AArch64/sve-vector-splat.ll
119	Pre-commit the new tests so we can see the change?

david-arm planned changes to this revision.Dec 16 2021, 7:02 AM

Rewrote the patch to make use of isSExtCheaperThanZExt instead so that this becomes a AArch64-specific change.

Herald added subscribers: luke957, frasercrmck, luismarques and 20 others. · View Herald TranscriptJan 7 2022, 7:30 AM

david-arm edited the summary of this revision. (Show Details)Jan 7 2022, 7:30 AM

Harbormaster completed remote builds in B142082: Diff 398142.Jan 7 2022, 7:30 AM

david-arm added a parent revision: D116810: [NFC] Add tests for splats of illegal integer vector types.Jan 7 2022, 7:30 AM

craig.topper added inline comments.Jan 7 2022, 4:16 PM

llvm/lib/Target/AArch64/AArch64ISelLowering.h
1142	I think you can write `if (!V)`

Use if (!V) when checking for null SDValue objects.

david-arm marked an inline comment as done.Jan 10 2022, 3:34 AM

RKSimon added inline comments.Jan 10 2022, 3:54 AM

llvm/include/llvm/CodeGen/TargetLowering.h
2647–2648	Please can you update the documentation to explain the V arg (is it the src/dst?) and that it can be SDValue() if unknown.

Harbormaster completed remote builds in B142388: Diff 398561.Jan 10 2022, 4:42 AM

Updated comments above isSExtCheaperThanZExt.

david-arm marked an inline comment as done.Jan 11 2022, 5:37 AM

Harbormaster completed remote builds in B142638: Diff 398920.Jan 11 2022, 7:03 AM

LGTM with one minor

llvm/test/CodeGen/AArch64/fast-isel-cmp-vec.ll
2	Regenerate + pre-commit before committing the patch so the patch just shows the codegen diff.

This revision is now accepted and ready to land.Jan 12 2022, 3:44 AM

Closed by commit rG31009f0b5afb: [CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative… (authored by david-arm). · Explain WhyJan 13 2022, 1:43 AM

This revision was automatically updated to reflect the committed changes.

david-arm added a commit: rG31009f0b5afb: [CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative….

david-arm added a reverting change: rGba471ba8d2a3: Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for….Jan 13 2022, 8:00 AM

david-arm added a commit: rG197f3c0deb76: [CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative….Jan 17 2022, 3:09 AM

Herald added a subscriber: alextsao1999. · View Herald TranscriptJan 17 2022, 3:09 AM

I've bisected a miscompilation to this file.

To reproduce:

$ git clone git://source.ffmpeg.org/ffmpeg
$ cd ffmpeg
$ ./configure --cc=clang --samples=$(pwd)/../samples
$ make fate-rsync
$ make -j$(nproc)
$ make fate-dpcm-interplay

The breakage happens in the libavformat/ipmovie.c file. (I also saw a couple other broken tests, so I think there are other source files affected too, but I didn't bisect and pinpoint those failures.)

The issue can be observed with https://martin.st/temp/ipmovie-preproc.c, with clang -target aarch64-linux-gnu -O3 -o - ipmovie-preproc.c. The generated code contains differences like this:

--- old.s       2022-01-18 10:30:24.726016244 +0200
+++ new.s       2022-01-18 10:30:01.650536299 +0200
@@ -506,7 +506,7 @@
        mov     w1, #56
        bl      av_log
        add     x9, x19, #1104
-       mov     w21, #65535
+       mov     w21, #-1
 .LBB3_9:                                // %while.end
        ldr     x0, [x19]
        ldr     w8, [x0, #44]

In D114357#3250506, @mstorsjo wrote:
I've bisected a miscompilation to this file.

To reproduce:
$ git clone git://source.ffmpeg.org/ffmpeg
$ cd ffmpeg
$ ./configure --cc=clang --samples=$(pwd)/../samples
$ make fate-rsync
$ make -j$(nproc)
$ make fate-dpcm-interplay
The breakage happens in the libavformat/ipmovie.c file. (I also saw a couple other broken tests, so I think there are other source files affected too, but I didn't bisect and pinpoint those failures.)

The issue can be observed with https://martin.st/temp/ipmovie-preproc.c, with clang -target aarch64-linux-gnu -O3 -o - ipmovie-preproc.c. The generated code contains differences like this:
--- old.s       2022-01-18 10:30:24.726016244 +0200
+++ new.s       2022-01-18 10:30:01.650536299 +0200
@@ -506,7 +506,7 @@
        mov     w1, #56
        bl      av_log
        add     x9, x19, #1104
-       mov     w21, #65535
+       mov     w21, #-1
 .LBB3_9:                                // %while.end
        ldr     x0, [x19]
        ldr     w8, [x0, #44]

Hi @mstorsjo, thanks for the info. I'll revert the patch again for now. It's strange because ANY_EXTEND should really mean "any", i.e. you don't care if it's sign-extend or zero-extend! I suspect there is a bug in codegen somewhere that either relies upon ANY_EXTEND actually being ZERO_EXTEND, or relies upon ANY_EXTEND being consistently the same. I'm worried because the isSExtCheaperThanZExt interface definitely allows for the possibility of sometimes choosing one over the other depending upon the types.

david-arm added a reverting change: rGf4515ab858ec: Revert "[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for….Jan 18 2022, 12:41 AM

In D114357#3250516, @david-arm wrote:

Hi @mstorsjo, thanks for the info. I'll revert the patch again for now. It's strange because ANY_EXTEND should really mean "any", i.e. you don't care if it's sign-extend or zero-extend! I suspect there is a bug in codegen somewhere that either relies upon ANY_EXTEND actually being ZERO_EXTEND, or relies upon ANY_EXTEND being consistently the same. I'm worried because the isSExtCheaperThanZExt interface definitely allows for the possibility of sometimes choosing one over the other depending upon the types.

Thanks, and that does indeed seem worrying.

For the sake of finding other possible similar related cases, with the same instructions, but with make -j$(nproc) fate it runs all tests - which currently triggered 24 failed tests (of which I believe there's maybe a 3-6 actual individual breakages), if you want to rerun more tests when you think you've pinpointed the root cause.

In D114357#3250516, @david-arm wrote:
In D114357#3250506, @mstorsjo wrote:
I've bisected a miscompilation to this file.

To reproduce:
$ git clone git://source.ffmpeg.org/ffmpeg
$ cd ffmpeg
$ ./configure --cc=clang --samples=$(pwd)/../samples
$ make fate-rsync
$ make -j$(nproc)
$ make fate-dpcm-interplay
The breakage happens in the libavformat/ipmovie.c file. (I also saw a couple other broken tests, so I think there are other source files affected too, but I didn't bisect and pinpoint those failures.)

The issue can be observed with https://martin.st/temp/ipmovie-preproc.c, with clang -target aarch64-linux-gnu -O3 -o - ipmovie-preproc.c. The generated code contains differences like this:
--- old.s       2022-01-18 10:30:24.726016244 +0200
+++ new.s       2022-01-18 10:30:01.650536299 +0200
@@ -506,7 +506,7 @@
        mov     w1, #56
        bl      av_log
        add     x9, x19, #1104
-       mov     w21, #65535
+       mov     w21, #-1
 .LBB3_9:                                // %while.end
        ldr     x0, [x19]
        ldr     w8, [x0, #44]
Hi @mstorsjo, thanks for the info. I'll revert the patch again for now. It's strange because ANY_EXTEND should really mean "any", i.e. you don't care if it's sign-extend or zero-extend! I suspect there is a bug in codegen somewhere that either relies upon ANY_EXTEND actually being ZERO_EXTEND, or relies upon ANY_EXTEND being consistently the same. I'm worried because the isSExtCheaperThanZExt interface definitely allows for the possibility of sometimes choosing one over the other depending upon the types.

I believe the culprit is FunctionLoweringInfo::ComputePHILiveOutRegInfo which assumes that ConstantInt inputs to phis will be zero extended.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

6 lines

lib/

CodeGen/

CodeGenPrepare.cpp

2 lines

SelectionDAG/

LegalizeIntegerTypes.cpp

2 lines

LegalizeTypes.h

2 lines

SelectionDAG.cpp

2 lines

TargetLowering.cpp

2 lines

Target/

AArch64/

AArch64ISelLowering.h

8 lines

RISCV/

RISCVISelLowering.h

2 lines

RISCVISelLowering.cpp

3 lines

test/

CodeGen/

AArch64/

18 lines

2 lines

6 lines

11 lines

17 lines

14 lines

3 lines

redundant-copy-elim-empty-mbb.ll

2 lines

statepoint-call-lowering.ll

2 lines

sve-vector-splat.ll

9 lines

unfold-masked-merge-vector-variablemask.ll

24 lines

vecreduce-and-legalization.ll

3 lines

Diff 399589

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 2,638 Lines • ▼ Show 20 Lines	public:
}		}

virtual bool isZExtFree(EVT FromTy, EVT ToTy) const { return false; }		virtual bool isZExtFree(EVT FromTy, EVT ToTy) const { return false; }
virtual bool isZExtFree(LLT FromTy, LLT ToTy, const DataLayout &DL,		virtual bool isZExtFree(LLT FromTy, LLT ToTy, const DataLayout &DL,
LLVMContext &Ctx) const {		LLVMContext &Ctx) const {
return isZExtFree(getApproximateEVTForLLT(FromTy, DL, Ctx),		return isZExtFree(getApproximateEVTForLLT(FromTy, DL, Ctx),
getApproximateEVTForLLT(ToTy, DL, Ctx));		getApproximateEVTForLLT(ToTy, DL, Ctx));
}		}

/// Return true if sign-extension from FromTy to ToTy is cheaper than		/// Return true if sign-extension of value \p V from FromTy to ToTy is
		RKSimonUnsubmitted Done Reply Inline Actions Please can you update the documentation to explain the V arg (is it the src/dst?) and that it can be SDValue() if unknown. RKSimon: Please can you update the documentation to explain the V arg (is it the src/dst?) and that it…
/// zero-extension.		/// cheaper than zero-extension, where \p V can be SDValue() if unknown.
virtual bool isSExtCheaperThanZExt(EVT FromTy, EVT ToTy) const {		virtual bool isSExtCheaperThanZExt(EVT FromTy, EVT ToTy, SDValue V) const {
return false;		return false;
}		}

/// Return true if sinking I's operands to the same basic block as I is		/// Return true if sinking I's operands to the same basic block as I is
/// profitable, e.g. because the operands can be folded into a target		/// profitable, e.g. because the operands can be folded into a target
/// instruction during instruction selection. After calling the function		/// instruction during instruction selection. After calling the function
/// \p Ops contains the Uses to sink ordered by dominance (dominating users		/// \p Ops contains the Uses to sink ordered by dominance (dominating users
/// come first).		/// come first).
▲ Show 20 Lines • Show All 2,138 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,998 Lines • ▼ Show 20 Lines	bool CodeGenPrepare::optimizeSwitchInst(SwitchInst *SI) {
auto *NewType = Type::getIntNTy(Context, RegWidth);		auto *NewType = Type::getIntNTy(Context, RegWidth);

// Extend the switch condition and case constants using the target preferred		// Extend the switch condition and case constants using the target preferred
// extend unless the switch condition is a function argument with an extend		// extend unless the switch condition is a function argument with an extend
// attribute. In that case, we can avoid an unnecessary mask/extension by		// attribute. In that case, we can avoid an unnecessary mask/extension by
// matching the argument extension instead.		// matching the argument extension instead.
Instruction::CastOps ExtType = Instruction::ZExt;		Instruction::CastOps ExtType = Instruction::ZExt;
// Some targets prefer SExt over ZExt.		// Some targets prefer SExt over ZExt.
if (TLI->isSExtCheaperThanZExt(OldVT, RegType))		if (TLI->isSExtCheaperThanZExt(OldVT, RegType, SDValue()))
ExtType = Instruction::SExt;		ExtType = Instruction::SExt;

if (auto *Arg = dyn_cast<Argument>(Cond)) {		if (auto *Arg = dyn_cast<Argument>(Cond)) {
if (Arg->hasSExtAttr())		if (Arg->hasSExtAttr())
ExtType = Instruction::SExt;		ExtType = Instruction::SExt;
if (Arg->hasZExtAttr())		if (Arg->hasZExtAttr())
ExtType = Instruction::ZExt;		ExtType = Instruction::ZExt;
}		}
▲ Show 20 Lines • Show All 1,283 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 1,698 Lines • ▼ Show 20 Lines	void DAGTypeLegalizer::PromoteSetCCOperands(SDValue &LHS, SDValue &RHS,
}		}

assert((ISD::isUnsignedIntSetCC(CCCode) \|\| ISD::isIntEqualitySetCC(CCCode)) &&		assert((ISD::isUnsignedIntSetCC(CCCode) \|\| ISD::isIntEqualitySetCC(CCCode)) &&
"Unknown integer comparison!");		"Unknown integer comparison!");

SDValue OpL = GetPromotedInteger(LHS);		SDValue OpL = GetPromotedInteger(LHS);
SDValue OpR = GetPromotedInteger(RHS);		SDValue OpR = GetPromotedInteger(RHS);

if (TLI.isSExtCheaperThanZExt(LHS.getValueType(), OpL.getValueType())) {		if (TLI.isSExtCheaperThanZExt(LHS.getValueType(), OpL.getValueType(), LHS)) {
// The target would prefer to promote the comparison operand with sign		// The target would prefer to promote the comparison operand with sign
// extension. Honor that unless the promoted values are already zero		// extension. Honor that unless the promoted values are already zero
// extended.		// extended.
unsigned OpLEffectiveBits =		unsigned OpLEffectiveBits =
DAG.computeKnownBits(OpL).countMaxActiveBits();		DAG.computeKnownBits(OpL).countMaxActiveBits();
unsigned OpREffectiveBits =		unsigned OpREffectiveBits =
DAG.computeKnownBits(OpR).countMaxActiveBits();		DAG.computeKnownBits(OpR).countMaxActiveBits();
if (OpLEffectiveBits <= LHS.getScalarValueSizeInBits() &&		if (OpLEffectiveBits <= LHS.getScalarValueSizeInBits() &&
▲ Show 20 Lines • Show All 3,610 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 277 Lines • ▼ Show 20 Lines	private:
// Get a promoted operand and sign or zero extend it to the final size		// Get a promoted operand and sign or zero extend it to the final size
// (depending on TargetLoweringInfo::isSExtCheaperThanZExt). For a given		// (depending on TargetLoweringInfo::isSExtCheaperThanZExt). For a given
// subtarget and type, the choice of sign or zero-extension will be		// subtarget and type, the choice of sign or zero-extension will be
// consistent.		// consistent.
SDValue SExtOrZExtPromotedInteger(SDValue Op) {		SDValue SExtOrZExtPromotedInteger(SDValue Op) {
EVT OldVT = Op.getValueType();		EVT OldVT = Op.getValueType();
SDLoc DL(Op);		SDLoc DL(Op);
Op = GetPromotedInteger(Op);		Op = GetPromotedInteger(Op);
if (TLI.isSExtCheaperThanZExt(OldVT, Op.getValueType()))		if (TLI.isSExtCheaperThanZExt(OldVT, Op.getValueType(), Op))
return DAG.getNode(ISD::SIGN_EXTEND_INREG, DL, Op.getValueType(), Op,		return DAG.getNode(ISD::SIGN_EXTEND_INREG, DL, Op.getValueType(), Op,
DAG.getValueType(OldVT));		DAG.getValueType(OldVT));
return DAG.getZeroExtendInReg(Op, DL, OldVT);		return DAG.getZeroExtendInReg(Op, DL, OldVT);
}		}

// Promote the given operand V (vector or scalar) according to N's specific		// Promote the given operand V (vector or scalar) according to N's specific
// reduction kind. N must be an integer VECREDUCE_* or VP_REDUCE_*. Returns		// reduction kind. N must be an integer VECREDUCE_* or VP_REDUCE_*. Returns
// the nominal extension opcode (ISD::(ANY\|ZERO\|SIGN)_EXTEND) and the		// the nominal extension opcode (ISD::(ANY\|ZERO\|SIGN)_EXTEND) and the
▲ Show 20 Lines • Show All 779 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,310 Lines • ▼ Show 20 Lines	SDValue Res =
{Chain, Op, getIntPtrConstant(0, DL)});		{Chain, Op, getIntPtrConstant(0, DL)});

return std::pair<SDValue, SDValue>(Res, SDValue(Res.getNode(), 1));		return std::pair<SDValue, SDValue>(Res, SDValue(Res.getNode(), 1));
}		}

SDValue SelectionDAG::getAnyExtOrTrunc(SDValue Op, const SDLoc &DL, EVT VT) {		SDValue SelectionDAG::getAnyExtOrTrunc(SDValue Op, const SDLoc &DL, EVT VT) {
return VT.bitsGT(Op.getValueType()) ?		return VT.bitsGT(Op.getValueType()) ?
getNode(ISD::ANY_EXTEND, DL, VT, Op) :		getNode(ISD::ANY_EXTEND, DL, VT, Op) :
getNode(ISD::TRUNCATE, DL, VT, Op);		getNode(ISD::TRUNCATE, DL, VT, Op);
		craig.topperUnsubmitted Not Done Reply Inline Actions There's no guarantee that the caller would use getAnyExtOrTrunc. I think this should be in the constant folding for getNode(ISD::ANY_EXTEND) craig.topper: There's no guarantee that the caller would use getAnyExtOrTrunc. I think this should be in the…
}		}

SDValue SelectionDAG::getSExtOrTrunc(SDValue Op, const SDLoc &DL, EVT VT) {		SDValue SelectionDAG::getSExtOrTrunc(SDValue Op, const SDLoc &DL, EVT VT) {
return VT.bitsGT(Op.getValueType()) ?		return VT.bitsGT(Op.getValueType()) ?
getNode(ISD::SIGN_EXTEND, DL, VT, Op) :		getNode(ISD::SIGN_EXTEND, DL, VT, Op) :
getNode(ISD::TRUNCATE, DL, VT, Op);		getNode(ISD::TRUNCATE, DL, VT, Op);
}		}

▲ Show 20 Lines • Show All 3,434 Lines • ▼ Show 20 Lines	case ISD::TRUNCATE:
if (C->isOpaque())		if (C->isOpaque())
break;		break;
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case ISD::ZERO_EXTEND:		case ISD::ZERO_EXTEND:
return getConstant(Val.zextOrTrunc(VT.getSizeInBits()), DL, VT,		return getConstant(Val.zextOrTrunc(VT.getSizeInBits()), DL, VT,
C->isTargetOpcode(), C->isOpaque());		C->isTargetOpcode(), C->isOpaque());
case ISD::ANY_EXTEND:		case ISD::ANY_EXTEND:
// Some targets like RISCV prefer to sign extend some types.		// Some targets like RISCV prefer to sign extend some types.
if (TLI->isSExtCheaperThanZExt(Operand.getValueType(), VT))		if (TLI->isSExtCheaperThanZExt(Operand.getValueType(), VT, Operand))
return getConstant(Val.sextOrTrunc(VT.getSizeInBits()), DL, VT,		return getConstant(Val.sextOrTrunc(VT.getSizeInBits()), DL, VT,
C->isTargetOpcode(), C->isOpaque());		C->isTargetOpcode(), C->isOpaque());
return getConstant(Val.zextOrTrunc(VT.getSizeInBits()), DL, VT,		return getConstant(Val.zextOrTrunc(VT.getSizeInBits()), DL, VT,
C->isTargetOpcode(), C->isOpaque());		C->isTargetOpcode(), C->isOpaque());
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
case ISD::SINT_TO_FP: {		case ISD::SINT_TO_FP: {
APFloat apf(EVTToAPFloatSemantics(VT),		APFloat apf(EVTToAPFloatSemantics(VT),
APInt::getZero(VT.getSizeInBits()));		APInt::getZero(VT.getSizeInBits()));
▲ Show 20 Lines • Show All 6,407 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,838 Lines • ▼ Show 20 Lines	if (N0.getOpcode() == ISD::ZERO_EXTEND) {
break;		break;
}		}
default:		default:
break; // todo, be more careful with signed comparisons		break; // todo, be more careful with signed comparisons
}		}
} else if (N0.getOpcode() == ISD::SIGN_EXTEND_INREG &&		} else if (N0.getOpcode() == ISD::SIGN_EXTEND_INREG &&
(Cond == ISD::SETEQ \|\| Cond == ISD::SETNE) &&		(Cond == ISD::SETEQ \|\| Cond == ISD::SETNE) &&
!isSExtCheaperThanZExt(cast<VTSDNode>(N0.getOperand(1))->getVT(),		!isSExtCheaperThanZExt(cast<VTSDNode>(N0.getOperand(1))->getVT(),
OpVT)) {		OpVT, N0.getOperand(1))) {
EVT ExtSrcTy = cast<VTSDNode>(N0.getOperand(1))->getVT();		EVT ExtSrcTy = cast<VTSDNode>(N0.getOperand(1))->getVT();
unsigned ExtSrcTyBits = ExtSrcTy.getSizeInBits();		unsigned ExtSrcTyBits = ExtSrcTy.getSizeInBits();
EVT ExtDstTy = N0.getValueType();		EVT ExtDstTy = N0.getValueType();
unsigned ExtDstTyBits = ExtDstTy.getSizeInBits();		unsigned ExtDstTyBits = ExtDstTy.getSizeInBits();

// If the constant doesn't fit into the number of bits for the source of		// If the constant doesn't fit into the number of bits for the source of
// the sign extension, it is impossible for both sides to be equal.		// the sign extension, it is impossible for both sides to be equal.
if (C1.getMinSignedBits() > ExtSrcTyBits)		if (C1.getMinSignedBits() > ExtSrcTyBits)
▲ Show 20 Lines • Show All 5,160 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 1,132 Lines • ▼ Show 20 Lines	private:
// cannot be used to convert between unpacked and packed types.		// cannot be used to convert between unpacked and packed types.
// These can make "bitcasting" a multiphase process. REINTERPRET_CAST is used		// These can make "bitcasting" a multiphase process. REINTERPRET_CAST is used
// to transition between unpacked and packed types of the same element type,		// to transition between unpacked and packed types of the same element type,
// with BITCAST used otherwise.		// with BITCAST used otherwise.
SDValue getSVESafeBitCast(EVT VT, SDValue Op, SelectionDAG &DAG) const;		SDValue getSVESafeBitCast(EVT VT, SDValue Op, SelectionDAG &DAG) const;

bool isConstantUnsignedBitfieldExtractLegal(unsigned Opc, LLT Ty1,		bool isConstantUnsignedBitfieldExtractLegal(unsigned Opc, LLT Ty1,
LLT Ty2) const override;		LLT Ty2) const override;

		bool isSExtCheaperThanZExt(EVT SrcVT, EVT DstVT, SDValue V) const override {
		craig.topperUnsubmitted Done Reply Inline Actions I think you can write `if (!V)` craig.topper: I think you can write `if (!V)`
		if (!V)
		return false;
		if (ConstantSDNode *C = isConstOrConstSplat(V))
		return C->getAPIntValue().isNegative();
		return false;
		}
};		};

namespace AArch64 {		namespace AArch64 {
FastISel *createFastISel(FunctionLoweringInfo &funcInfo,		FastISel *createFastISel(FunctionLoweringInfo &funcInfo,
const TargetLibraryInfo *libInfo);		const TargetLibraryInfo *libInfo);
} // end namespace AArch64		} // end namespace AArch64

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/lib/Target/RISCV/RISCVISelLowering.h

Show First 20 Lines • Show All 320 Lines • ▼ Show 20 Lines	public:
bool isLegalAddressingMode(const DataLayout &DL, const AddrMode &AM, Type *Ty,		bool isLegalAddressingMode(const DataLayout &DL, const AddrMode &AM, Type *Ty,
unsigned AS,		unsigned AS,
Instruction *I = nullptr) const override;		Instruction *I = nullptr) const override;
bool isLegalICmpImmediate(int64_t Imm) const override;		bool isLegalICmpImmediate(int64_t Imm) const override;
bool isLegalAddImmediate(int64_t Imm) const override;		bool isLegalAddImmediate(int64_t Imm) const override;
bool isTruncateFree(Type SrcTy, Type DstTy) const override;		bool isTruncateFree(Type SrcTy, Type DstTy) const override;
bool isTruncateFree(EVT SrcVT, EVT DstVT) const override;		bool isTruncateFree(EVT SrcVT, EVT DstVT) const override;
bool isZExtFree(SDValue Val, EVT VT2) const override;		bool isZExtFree(SDValue Val, EVT VT2) const override;
bool isSExtCheaperThanZExt(EVT SrcVT, EVT DstVT) const override;		bool isSExtCheaperThanZExt(EVT SrcVT, EVT DstVT, SDValue V) const override;
bool isCheapToSpeculateCttz() const override;		bool isCheapToSpeculateCttz() const override;
bool isCheapToSpeculateCtlz() const override;		bool isCheapToSpeculateCtlz() const override;
bool hasAndNotCompare(SDValue Y) const override;		bool hasAndNotCompare(SDValue Y) const override;
bool shouldSinkOperands(Instruction *I,		bool shouldSinkOperands(Instruction *I,
SmallVectorImpl<Use *> &Ops) const override;		SmallVectorImpl<Use *> &Ops) const override;
bool isFPImmLegal(const APFloat &Imm, EVT VT,		bool isFPImmLegal(const APFloat &Imm, EVT VT,
bool ForCodeSize) const override;		bool ForCodeSize) const override;

▲ Show 20 Lines • Show All 347 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,192 Lines • ▼ Show 20 Lines	if ((MemVT == MVT::i8 \|\| MemVT == MVT::i16) &&
(LD->getExtensionType() == ISD::NON_EXTLOAD \|\|		(LD->getExtensionType() == ISD::NON_EXTLOAD \|\|
LD->getExtensionType() == ISD::ZEXTLOAD))		LD->getExtensionType() == ISD::ZEXTLOAD))
return true;		return true;
}		}

return TargetLowering::isZExtFree(Val, VT2);		return TargetLowering::isZExtFree(Val, VT2);
}		}

bool RISCVTargetLowering::isSExtCheaperThanZExt(EVT SrcVT, EVT DstVT) const {		bool RISCVTargetLowering::isSExtCheaperThanZExt(EVT SrcVT, EVT DstVT,
		SDValue V) const {
return Subtarget.is64Bit() && SrcVT == MVT::i32 && DstVT == MVT::i64;		return Subtarget.is64Bit() && SrcVT == MVT::i32 && DstVT == MVT::i64;
}		}

bool RISCVTargetLowering::isCheapToSpeculateCttz() const {		bool RISCVTargetLowering::isCheapToSpeculateCttz() const {
return Subtarget.hasStdExtZbb();		return Subtarget.hasStdExtZbb();
}		}

bool RISCVTargetLowering::isCheapToSpeculateCtlz() const {		bool RISCVTargetLowering::isCheapToSpeculateCtlz() const {
▲ Show 20 Lines • Show All 9,376 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/arm64-vshuffle.ll

	; RUN: llc < %s -mtriple=arm64-apple-ios7.0 -mcpu=cyclone \| FileCheck %s			; RUN: llc < %s -mtriple=arm64-apple-ios7.0 -mcpu=cyclone \| FileCheck %s


	; CHECK: test1			; CHECK: test1
	; CHECK: movi.16b v[[REG0:[0-9]+]], #0			; CHECK: movi.16b v[[REG0:[0-9]+]], #0
	define <8 x i1> @test1() {			define <8 x i1> @test1() {
	entry:			entry:
	%Shuff = shufflevector <8 x i1> <i1 0, i1 1, i1 2, i1 3, i1 4, i1 5, i1 6,			%Shuff = shufflevector <8 x i1> <i1 0, i1 1, i1 2, i1 3, i1 4, i1 5, i1 6,
	i1 7>,			i1 7>,
	<8 x i1> <i1 0, i1 1, i1 2, i1 3, i1 4, i1 5, i1 6,			<8 x i1> <i1 0, i1 1, i1 2, i1 3, i1 4, i1 5, i1 6,
	i1 7>,			i1 7>,
	<8 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10,			<8 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10,
	i32 12, i32 14, i32 0>			i32 12, i32 14, i32 0>
	ret <8 x i1> %Shuff			ret <8 x i1> %Shuff
	}			}

	; CHECK: lCPI1_0:
	; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 1 ; 0x1
	; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 0 ; 0x0
	; CHECK: test2			; CHECK: test2
	; CHECK: adrp x[[REG2:[0-9]+]], lCPI1_0@PAGE			; CHECK: movi d{{[0-9]+}}, #0x0000ff00000000
	; CHECK: ldr d[[REG1:[0-9]+]], [x[[REG2]], lCPI1_0@PAGEOFF]
	define <8 x i1>@test2() {			define <8 x i1>@test2() {
	bb:			bb:
	%Shuff = shufflevector <8 x i1> zeroinitializer,			%Shuff = shufflevector <8 x i1> zeroinitializer,
	<8 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0>,			<8 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0>,
	<8 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12, i32 14,			<8 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12, i32 14,
	i32 0>			i32 0>
	ret <8 x i1> %Shuff			ret <8 x i1> %Shuff
	}			}

	; CHECK: test3			; CHECK: test3
	; CHECK: movi.4s v{{[0-9]+}}, #1			; CHECK: movi.2d v{{[0-9]+}}, #0x0000ff000000ff
	define <16 x i1> @test3(i1* %ptr, i32 %v) {			define <16 x i1> @test3(i1* %ptr, i32 %v) {
	bb:			bb:
	%Shuff = shufflevector <16 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0, i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0>, <16 x i1> undef,			%Shuff = shufflevector <16 x i1> <i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0, i1 0, i1 1, i1 1, i1 0, i1 0, i1 1, i1 0, i1 0>, <16 x i1> undef,
	<16 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12, i32 14,			<16 x i32> <i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12, i32 14,
	i32 0, i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12,			i32 0, i32 2, i32 undef, i32 6, i32 undef, i32 10, i32 12,
	i32 14, i32 0>			i32 14, i32 0>
	ret <16 x i1> %Shuff			ret <16 x i1> %Shuff
	}			}


	; CHECK: lCPI3_0:			; CHECK: lCPI3_0:
	; CHECK: .byte 0 ; 0x0			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 0 ; 0x0			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 0 ; 0x0			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 1 ; 0x1			; CHECK: .byte 255 ; 0xff
	; CHECK: .byte 0 ; 0x0			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 0 ; 0x0			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 0 ; 0x0			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 0 ; 0x0			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 0 ; 0x0			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 0 ; 0x0			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 0 ; 0x0			; CHECK: .byte 0 ; 0x0
	; CHECK: .byte 0 ; 0x0			; CHECK: .byte 0 ; 0x0
	Show All 16 Lines

llvm/test/CodeGen/AArch64/arm64_32-atomics.ll

	Show First 20 Lines • Show All 243 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: test_cmpxchg_ptr:			; CHECK-LABEL: test_cmpxchg_ptr:
	; CHECK: [[LOOP:LBB[0-9]+_[0-9]+]]:			; CHECK: [[LOOP:LBB[0-9]+_[0-9]+]]:
	; CHECK: ldaxr [[OLD:w[0-9]+]], [x0]			; CHECK: ldaxr [[OLD:w[0-9]+]], [x0]
	; CHECK: cmp [[OLD]], w1			; CHECK: cmp [[OLD]], w1
	; CHECK: b.ne [[DONE:LBB[0-9]+_[0-9]+]]			; CHECK: b.ne [[DONE:LBB[0-9]+_[0-9]+]]
	; CHECK: stlxr [[SUCCESS:w[0-9]+]], w2, [x0]			; CHECK: stlxr [[SUCCESS:w[0-9]+]], w2, [x0]
	; CHECK: cbnz [[SUCCESS]], [[LOOP]]			; CHECK: cbnz [[SUCCESS]], [[LOOP]]

	; CHECK: mov w1, #1			; CHECK: mov w1, #-1
	; CHECK: mov w0, [[OLD]]			; CHECK: mov w0, [[OLD]]
	; CHECK: ret			; CHECK: ret

	; CHECK: [[DONE]]:			; CHECK: [[DONE]]:
	; CHECK: mov w1, wzr			; CHECK: mov w1, wzr
	; CHECK: mov w0, [[OLD]]			; CHECK: mov w0, [[OLD]]
	; CHECK: clrex			; CHECK: clrex
	; CHECK: ret			; CHECK: ret
	%res = cmpxchg i8** %addr, i8* %cmp, i8* %new acq_rel acquire			%res = cmpxchg i8** %addr, i8* %cmp, i8* %new acq_rel acquire
	ret {i8*, i1} %res			ret {i8*, i1} %res
	}			}

llvm/test/CodeGen/AArch64/cmpxchg-idioms.ll

	Show All 9 Lines
	; CHECK-NEXT: ldaxr w8, [x0]			; CHECK-NEXT: ldaxr w8, [x0]
	; CHECK-NEXT: cmp w8, w1			; CHECK-NEXT: cmp w8, w1
	; CHECK-NEXT: b.ne LBB0_4			; CHECK-NEXT: b.ne LBB0_4
	; CHECK-NEXT: ; %bb.2: ; %cmpxchg.trystore			; CHECK-NEXT: ; %bb.2: ; %cmpxchg.trystore
	; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: stlxr w8, w2, [x0]			; CHECK-NEXT: stlxr w8, w2, [x0]
	; CHECK-NEXT: cbnz w8, LBB0_1			; CHECK-NEXT: cbnz w8, LBB0_1
	; CHECK-NEXT: ; %bb.3:			; CHECK-NEXT: ; %bb.3:
	; CHECK-NEXT: mov w0, #1			; CHECK-NEXT: mov w0, #-1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: LBB0_4: ; %cmpxchg.nostore			; CHECK-NEXT: LBB0_4: ; %cmpxchg.nostore
	; CHECK-NEXT: mov w0, wzr			; CHECK-NEXT: mov w0, wzr
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	;			;
	; OUTLINE-ATOMICS-LABEL: test_return:			; OUTLINE-ATOMICS-LABEL: test_return:
	; OUTLINE-ATOMICS: ; %bb.0:			; OUTLINE-ATOMICS: ; %bb.0:
	Show All 32 Lines
	; CHECK-NEXT: ldaxrb w9, [x0]			; CHECK-NEXT: ldaxrb w9, [x0]
	; CHECK-NEXT: cmp w9, w8			; CHECK-NEXT: cmp w9, w8
	; CHECK-NEXT: b.ne LBB1_4			; CHECK-NEXT: b.ne LBB1_4
	; CHECK-NEXT: ; %bb.2: ; %cmpxchg.trystore			; CHECK-NEXT: ; %bb.2: ; %cmpxchg.trystore
	; CHECK-NEXT: ; in Loop: Header=BB1_1 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB1_1 Depth=1
	; CHECK-NEXT: stlxrb w9, w2, [x0]			; CHECK-NEXT: stlxrb w9, w2, [x0]
	; CHECK-NEXT: cbnz w9, LBB1_1			; CHECK-NEXT: cbnz w9, LBB1_1
	; CHECK-NEXT: ; %bb.3:			; CHECK-NEXT: ; %bb.3:
	; CHECK-NEXT: mov w8, #1			; CHECK-NEXT: mov w8, #-1
	; CHECK-NEXT: eor w0, w8, #0x1			; CHECK-NEXT: eor w0, w8, #0x1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: LBB1_4: ; %cmpxchg.nostore			; CHECK-NEXT: LBB1_4: ; %cmpxchg.nostore
	; CHECK-NEXT: eor w0, wzr, #0x1			; CHECK-NEXT: eor w0, wzr, #0x1
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	;			;
	; OUTLINE-ATOMICS-LABEL: test_return_bool:			; OUTLINE-ATOMICS-LABEL: test_return_bool:
	▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ldaxr w8, [x19]			; CHECK-NEXT: ldaxr w8, [x19]
	; CHECK-NEXT: cmp w8, w21			; CHECK-NEXT: cmp w8, w21
	; CHECK-NEXT: b.ne LBB3_4			; CHECK-NEXT: b.ne LBB3_4
	; CHECK-NEXT: ; %bb.2: ; %cmpxchg.trystore			; CHECK-NEXT: ; %bb.2: ; %cmpxchg.trystore
	; CHECK-NEXT: ; in Loop: Header=BB3_1 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB3_1 Depth=1
	; CHECK-NEXT: stlxr w8, w20, [x19]			; CHECK-NEXT: stlxr w8, w20, [x19]
	; CHECK-NEXT: cbnz w8, LBB3_1			; CHECK-NEXT: cbnz w8, LBB3_1
	; CHECK-NEXT: ; %bb.3:			; CHECK-NEXT: ; %bb.3:
	; CHECK-NEXT: mov w8, #1			; CHECK-NEXT: mov w8, #-1
	; CHECK-NEXT: b LBB3_5			; CHECK-NEXT: b LBB3_5
	; CHECK-NEXT: LBB3_4: ; %cmpxchg.nostore			; CHECK-NEXT: LBB3_4: ; %cmpxchg.nostore
	; CHECK-NEXT: mov w8, wzr			; CHECK-NEXT: mov w8, wzr
	; CHECK-NEXT: clrex			; CHECK-NEXT: clrex
	; CHECK-NEXT: LBB3_5: ; %for.cond.preheader			; CHECK-NEXT: LBB3_5: ; %for.cond.preheader
	; CHECK-NEXT: mov w22, #2			; CHECK-NEXT: mov w22, #2
	; CHECK-NEXT: LBB3_6: ; %for.cond			; CHECK-NEXT: LBB3_6: ; %for.cond
	; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
	▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/dag-numsignbits.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=aarch64-unknown \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64-unknown \| FileCheck %s

	; PR32273			; PR32273

	define void @signbits_vXi1(<4 x i16> %a1) {			define void @signbits_vXi1(<4 x i16> %a1) {
	; CHECK-LABEL: signbits_vXi1:			; CHECK-LABEL: signbits_vXi1:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: adrp x8, .LCPI0_0			; CHECK-NEXT: adrp x8, .LCPI0_0
	; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0			; CHECK-NEXT: // kill: def $d0 killed $d0 def $q0
	; CHECK-NEXT: mov w1, wzr			; CHECK-NEXT: movi v2.4h, #1
	; CHECK-NEXT: dup v0.4h, v0.h[0]			; CHECK-NEXT: dup v0.4h, v0.h[0]
				; CHECK-NEXT: mov w1, wzr
	; CHECK-NEXT: mov w2, wzr			; CHECK-NEXT: mov w2, wzr
	; CHECK-NEXT: ldr d1, [x8, :lo12:.LCPI0_0]			; CHECK-NEXT: ldr d1, [x8, :lo12:.LCPI0_0]
	; CHECK-NEXT: adrp x8, .LCPI0_1
	; CHECK-NEXT: add v0.4h, v0.4h, v1.4h			; CHECK-NEXT: add v0.4h, v0.4h, v1.4h
	; CHECK-NEXT: movi v1.4h, #1			; CHECK-NEXT: cmgt v0.4h, v2.4h, v0.4h
	; CHECK-NEXT: cmgt v0.4h, v1.4h, v0.4h
	; CHECK-NEXT: ldr d1, [x8, :lo12:.LCPI0_1]
	; CHECK-NEXT: and v0.8b, v0.8b, v1.8b
	; CHECK-NEXT: shl v0.4h, v0.4h, #15
	; CHECK-NEXT: cmlt v0.4h, v0.4h, #0
	; CHECK-NEXT: umov w0, v0.h[0]			; CHECK-NEXT: umov w0, v0.h[0]
	; CHECK-NEXT: umov w3, v0.h[3]			; CHECK-NEXT: umov w3, v0.h[3]
	; CHECK-NEXT: b foo			; CHECK-NEXT: b foo
	%tmp3 = shufflevector <4 x i16> %a1, <4 x i16> undef, <4 x i32> zeroinitializer			%tmp3 = shufflevector <4 x i16> %a1, <4 x i16> undef, <4 x i32> zeroinitializer
	%tmp5 = add <4 x i16> %tmp3, <i16 18249, i16 6701, i16 -18744, i16 -25086>			%tmp5 = add <4 x i16> %tmp3, <i16 18249, i16 6701, i16 -18744, i16 -25086>
	%tmp6 = icmp slt <4 x i16> %tmp5, <i16 1, i16 1, i16 1, i16 1>			%tmp6 = icmp slt <4 x i16> %tmp5, <i16 1, i16 1, i16 1, i16 1>
	%tmp7 = and <4 x i1> %tmp6, <i1 true, i1 false, i1 false, i1 true>			%tmp7 = and <4 x i1> %tmp6, <i1 true, i1 false, i1 false, i1 true>
	%tmp8 = sext <4 x i1> %tmp7 to <4 x i16>			%tmp8 = sext <4 x i1> %tmp7 to <4 x i16>
	Show All 13 Lines

llvm/test/CodeGen/AArch64/fast-isel-cmp-vec.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=aarch64-apple-darwin -fast-isel -verify-machineinstrs \			; RUN: llc -mtriple=aarch64-apple-darwin -fast-isel -verify-machineinstrs \
				RKSimonUnsubmitted Not Done Reply Inline Actions Regenerate + pre-commit before committing the patch so the patch just shows the codegen diff. RKSimon: Regenerate + pre-commit before committing the patch so the patch just shows the codegen diff.
	; RUN: -aarch64-enable-atomic-cfg-tidy=0 -disable-cgp -disable-branch-fold \			; RUN: -aarch64-enable-atomic-cfg-tidy=0 -disable-cgp -disable-branch-fold \
	; RUN: < %s \| FileCheck %s			; RUN: < %s \| FileCheck %s

	;			;
	; Verify that we don't mess up vector comparisons in fast-isel.			; Verify that we don't mess up vector comparisons in fast-isel.
	;			;

	define <2 x i32> @icmp_v2i32(<2 x i32> %a) {			define <2 x i32> @icmp_v2i32(<2 x i32> %a) {
	Show All 9 Lines
	bb2:			bb2:
	%z = zext <2 x i1> %c to <2 x i32>			%z = zext <2 x i1> %c to <2 x i32>
	ret <2 x i32> %z			ret <2 x i32> %z
	}			}

	define <2 x i32> @icmp_constfold_v2i32(<2 x i32> %a) {			define <2 x i32> @icmp_constfold_v2i32(<2 x i32> %a) {
	; CHECK-LABEL: icmp_constfold_v2i32:			; CHECK-LABEL: icmp_constfold_v2i32:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: movi.2s v0, #1			; CHECK-NEXT: movi.2d v0, #0xffffffffffffffff
	; CHECK-NEXT: and.8b v0, v0, v0			; CHECK-NEXT: ; %bb.1: ; %bb2
				; CHECK-NEXT: movi.2s v1, #1
				; CHECK-NEXT: and.8b v0, v0, v1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = icmp eq <2 x i32> %a, %a			%1 = icmp eq <2 x i32> %a, %a
	br label %bb2			br label %bb2
	bb2:			bb2:
	%2 = zext <2 x i1> %1 to <2 x i32>			%2 = zext <2 x i1> %1 to <2 x i32>
	ret <2 x i32> %2			ret <2 x i32> %2
	}			}

	Show All 12 Lines
	bb2:			bb2:
	%z = zext <4 x i1> %c to <4 x i32>			%z = zext <4 x i1> %c to <4 x i32>
	ret <4 x i32> %z			ret <4 x i32> %z
	}			}

	define <4 x i32> @icmp_constfold_v4i32(<4 x i32> %a) {			define <4 x i32> @icmp_constfold_v4i32(<4 x i32> %a) {
	; CHECK-LABEL: icmp_constfold_v4i32:			; CHECK-LABEL: icmp_constfold_v4i32:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: movi.4h v0, #1			; CHECK-NEXT: movi.2d v0, #0xffffffffffffffff
	; CHECK-NEXT: ; %bb.1: ; %bb2			; CHECK-NEXT: ; %bb.1: ; %bb2
	; CHECK-NEXT: and.8b v0, v0, v0			; CHECK-NEXT: movi.4h v1, #1
				; CHECK-NEXT: and.8b v0, v0, v1
	; CHECK-NEXT: ushll.4s v0, v0, #0			; CHECK-NEXT: ushll.4s v0, v0, #0
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = icmp eq <4 x i32> %a, %a			%1 = icmp eq <4 x i32> %a, %a
	br label %bb2			br label %bb2
	bb2:			bb2:
	%2 = zext <4 x i1> %1 to <4 x i32>			%2 = zext <4 x i1> %1 to <4 x i32>
	ret <4 x i32> %2			ret <4 x i32> %2
	}			}
	Show All 11 Lines
	bb2:			bb2:
	%z = zext <16 x i1> %c to <16 x i8>			%z = zext <16 x i1> %c to <16 x i8>
	ret <16 x i8> %z			ret <16 x i8> %z
	}			}

	define <16 x i8> @icmp_constfold_v16i8(<16 x i8> %a) {			define <16 x i8> @icmp_constfold_v16i8(<16 x i8> %a) {
	; CHECK-LABEL: icmp_constfold_v16i8:			; CHECK-LABEL: icmp_constfold_v16i8:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: movi.16b v0, #1			; CHECK-NEXT: movi.2d v0, #0xffffffffffffffff
	; CHECK-NEXT: and.16b v0, v0, v0			; CHECK-NEXT: ; %bb.1: ; %bb2
				; CHECK-NEXT: movi.16b v1, #1
				; CHECK-NEXT: and.16b v0, v0, v1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%1 = icmp eq <16 x i8> %a, %a			%1 = icmp eq <16 x i8> %a, %a
	br label %bb2			br label %bb2
	bb2:			bb2:
	%2 = zext <16 x i1> %1 to <16 x i8>			%2 = zext <16 x i1> %1 to <16 x i8>
	ret <16 x i8> %2			ret <16 x i8> %2
	}			}

llvm/test/CodeGen/AArch64/funnel-shift.ll

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
}		}

; extract(concat(0b1110000, 0b1111111) << 2) = 0b1000011		; extract(concat(0b1110000, 0b1111111) << 2) = 0b1000011

declare i7 @llvm.fshl.i7(i7, i7, i7)		declare i7 @llvm.fshl.i7(i7, i7, i7)
define i7 @fshl_i7_const_fold() {		define i7 @fshl_i7_const_fold() {
; CHECK-LABEL: fshl_i7_const_fold:		; CHECK-LABEL: fshl_i7_const_fold:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w0, #67		; CHECK-NEXT: mov w0, #-61
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%f = call i7 @llvm.fshl.i7(i7 112, i7 127, i7 2)		%f = call i7 @llvm.fshl.i7(i7 112, i7 127, i7 2)
ret i7 %f		ret i7 %f
}		}

define i8 @fshl_i8_const_fold_overshift_1() {		define i8 @fshl_i8_const_fold_overshift_1() {
; CHECK-LABEL: fshl_i8_const_fold_overshift_1:		; CHECK-LABEL: fshl_i8_const_fold_overshift_1:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w0, #128		; CHECK-NEXT: mov w0, #-128
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%f = call i8 @llvm.fshl.i8(i8 255, i8 0, i8 15)		%f = call i8 @llvm.fshl.i8(i8 255, i8 0, i8 15)
ret i8 %f		ret i8 %f
}		}

define i8 @fshl_i8_const_fold_overshift_2() {		define i8 @fshl_i8_const_fold_overshift_2() {
; CHECK-LABEL: fshl_i8_const_fold_overshift_2:		; CHECK-LABEL: fshl_i8_const_fold_overshift_2:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret i64 %f		ret i64 %f
}		}

; This should work without any node-specific logic.		; This should work without any node-specific logic.

define i8 @fshl_i8_const_fold() {		define i8 @fshl_i8_const_fold() {
; CHECK-LABEL: fshl_i8_const_fold:		; CHECK-LABEL: fshl_i8_const_fold:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w0, #128		; CHECK-NEXT: mov w0, #-128
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%f = call i8 @llvm.fshl.i8(i8 255, i8 0, i8 7)		%f = call i8 @llvm.fshl.i8(i8 255, i8 0, i8 7)
ret i8 %f		ret i8 %f
}		}

; Repeat everything for funnel shift right.		; Repeat everything for funnel shift right.

; General case - all operands can be variables.		; General case - all operands can be variables.
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%f = call i7 @llvm.fshr.i7(i7 112, i7 127, i7 2)		%f = call i7 @llvm.fshr.i7(i7 112, i7 127, i7 2)
ret i7 %f		ret i7 %f
}		}

define i8 @fshr_i8_const_fold_overshift_1() {		define i8 @fshr_i8_const_fold_overshift_1() {
; CHECK-LABEL: fshr_i8_const_fold_overshift_1:		; CHECK-LABEL: fshr_i8_const_fold_overshift_1:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w0, #254		; CHECK-NEXT: mov w0, #-2
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%f = call i8 @llvm.fshr.i8(i8 255, i8 0, i8 15)		%f = call i8 @llvm.fshr.i8(i8 255, i8 0, i8 15)
ret i8 %f		ret i8 %f
}		}

define i8 @fshr_i8_const_fold_overshift_2() {		define i8 @fshr_i8_const_fold_overshift_2() {
; CHECK-LABEL: fshr_i8_const_fold_overshift_2:		; CHECK-LABEL: fshr_i8_const_fold_overshift_2:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w0, #225		; CHECK-NEXT: mov w0, #-31
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%f = call i8 @llvm.fshr.i8(i8 15, i8 15, i8 11)		%f = call i8 @llvm.fshr.i8(i8 15, i8 15, i8 11)
ret i8 %f		ret i8 %f
}		}

define i8 @fshr_i8_const_fold_overshift_3() {		define i8 @fshr_i8_const_fold_overshift_3() {
; CHECK-LABEL: fshr_i8_const_fold_overshift_3:		; CHECK-LABEL: fshr_i8_const_fold_overshift_3:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w0, #255		; CHECK-NEXT: mov w0, #-1
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%f = call i8 @llvm.fshr.i8(i8 0, i8 255, i8 8)		%f = call i8 @llvm.fshr.i8(i8 0, i8 255, i8 8)
ret i8 %f		ret i8 %f
}		}

; With constant shift amount, this is 'extr'.		; With constant shift amount, this is 'extr'.

define i32 @fshr_i32_const_shift(i32 %x, i32 %y) {		define i32 @fshr_i32_const_shift(i32 %x, i32 %y) {
Show All 27 Lines	; CHECK-NEXT: ret
ret i64 %f		ret i64 %f
}		}

; This should work without any node-specific logic.		; This should work without any node-specific logic.

define i8 @fshr_i8_const_fold() {		define i8 @fshr_i8_const_fold() {
; CHECK-LABEL: fshr_i8_const_fold:		; CHECK-LABEL: fshr_i8_const_fold:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w0, #254		; CHECK-NEXT: mov w0, #-2
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%f = call i8 @llvm.fshr.i8(i8 255, i8 0, i8 7)		%f = call i8 @llvm.fshr.i8(i8 255, i8 0, i8 7)
ret i8 %f		ret i8 %f
}		}

define i32 @fshl_i32_shift_by_bitwidth(i32 %x, i32 %y) {		define i32 @fshl_i32_shift_by_bitwidth(i32 %x, i32 %y) {
; CHECK-LABEL: fshl_i32_shift_by_bitwidth:		; CHECK-LABEL: fshl_i32_shift_by_bitwidth:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
Show All 31 Lines

llvm/test/CodeGen/AArch64/reduce-and.ll

Show First 20 Lines • Show All 217 Lines • ▼ Show 20 Lines	; GISEL-NEXT: ret
%and_result = call i8 @llvm.vector.reduce.and.v1i8(<1 x i8> %a)		%and_result = call i8 @llvm.vector.reduce.and.v1i8(<1 x i8> %a)
ret i8 %and_result		ret i8 %and_result
}		}

define i8 @test_redand_v3i8(<3 x i8> %a) {		define i8 @test_redand_v3i8(<3 x i8> %a) {
; CHECK-LABEL: test_redand_v3i8:		; CHECK-LABEL: test_redand_v3i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: and w8, w0, w1		; CHECK-NEXT: and w8, w0, w1
; CHECK-NEXT: and w8, w8, w2		; CHECK-NEXT: and w0, w8, w2
; CHECK-NEXT: and w0, w8, #0xff
; CHECK-NEXT: ret		; CHECK-NEXT: ret
;		;
; GISEL-LABEL: test_redand_v3i8:		; GISEL-LABEL: test_redand_v3i8:
; GISEL: // %bb.0:		; GISEL: // %bb.0:
; GISEL-NEXT: and w8, w0, w1		; GISEL-NEXT: and w8, w0, w1
; GISEL-NEXT: and w0, w8, w2		; GISEL-NEXT: and w0, w8, w2
; GISEL-NEXT: ret		; GISEL-NEXT: ret
%and_result = call i8 @llvm.vector.reduce.and.v3i8(<3 x i8> %a)		%and_result = call i8 @llvm.vector.reduce.and.v3i8(<3 x i8> %a)
▲ Show 20 Lines • Show All 420 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/redundant-copy-elim-empty-mbb.ll

	; RUN: llc < %s \| FileCheck %s			; RUN: llc < %s \| FileCheck %s
	; Make sure we don't crash in AArch64RedundantCopyElimination when a			; Make sure we don't crash in AArch64RedundantCopyElimination when a
	; MachineBasicBlock is empty. PR29035.			; MachineBasicBlock is empty. PR29035.

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	declare i8* @bar()			declare i8* @bar()

	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: tbz			; CHECK: tbz
	; CHECK: mov{{.*}}, #1			; CHECK: mov{{.*}}, #-1
	; CHECK: ret			; CHECK: ret
	; CHECK: bl bar			; CHECK: bl bar
	; CHECK: cbnz			; CHECK: cbnz
	; CHECK: ret			; CHECK: ret
	define i1 @foo(i1 %start) {			define i1 @foo(i1 %start) {
	entry:			entry:
	br i1 %start, label %cleanup, label %if.end			br i1 %start, label %cleanup, label %if.end

	Show All 9 Lines

llvm/test/CodeGen/AArch64/statepoint-call-lowering.ll

	Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: .Ltmp8:			; CHECK-NEXT: .Ltmp8:
	; CHECK-NEXT: tbz w20, #0, .LBB8_2			; CHECK-NEXT: tbz w20, #0, .LBB8_2
	; CHECK-NEXT: // %bb.1: // %left			; CHECK-NEXT: // %bb.1: // %left
	; CHECK-NEXT: mov w19, w0			; CHECK-NEXT: mov w19, w0
	; CHECK-NEXT: ldr x0, [sp, #8]			; CHECK-NEXT: ldr x0, [sp, #8]
	; CHECK-NEXT: bl consume			; CHECK-NEXT: bl consume
	; CHECK-NEXT: b .LBB8_3			; CHECK-NEXT: b .LBB8_3
	; CHECK-NEXT: .LBB8_2:			; CHECK-NEXT: .LBB8_2:
	; CHECK-NEXT: mov w19, #1			; CHECK-NEXT: mov w19, #-1
	; CHECK-NEXT: .LBB8_3: // %common.ret			; CHECK-NEXT: .LBB8_3: // %common.ret
	; CHECK-NEXT: and w0, w19, #0x1			; CHECK-NEXT: and w0, w19, #0x1
	; CHECK-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload			; CHECK-NEXT: ldp x20, x19, [sp, #16] // 16-byte Folded Reload
	; CHECK-NEXT: ldr x30, [sp], #32 // 8-byte Folded Reload			; CHECK-NEXT: ldr x30, [sp], #32 // 8-byte Folded Reload
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%safepoint_token = tail call token (i64, i32, i1 (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_i1f(i64 0, i32 0, i1 () @return_i1, i32 0, i32 0, i32 0, i32 0) ["gc-live" (i32 addrspace(1)* %a)]			%safepoint_token = tail call token (i64, i32, i1 (), i32, i32, ...) @llvm.experimental.gc.statepoint.p0f_i1f(i64 0, i32 0, i1 () @return_i1, i32 0, i32 0, i32 0, i32 0) ["gc-live" (i32 addrspace(1)* %a)]
	br i1 %external_cond, label %left, label %right			br i1 %external_cond, label %left, label %right
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-vector-splat.ll

Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z0.h, w0		; CHECK-NEXT: mov z0.h, w0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%ins = insertelement <vscale x 8 x i8> undef, i8 %val, i32 0		%ins = insertelement <vscale x 8 x i8> undef, i8 %val, i32 0
%splat = shufflevector <vscale x 8 x i8> %ins, <vscale x 8 x i8> undef, <vscale x 8 x i32> zeroinitializer		%splat = shufflevector <vscale x 8 x i8> %ins, <vscale x 8 x i8> undef, <vscale x 8 x i32> zeroinitializer
ret <vscale x 8 x i8> %splat		ret <vscale x 8 x i8> %splat
}		}

define <vscale x 8 x i8> @sve_splat_8xi8_imm() {		define <vscale x 8 x i8> @sve_splat_8xi8_imm() {
		craig.topperUnsubmitted Not Done Reply Inline Actions Pre-commit the new tests so we can see the change? craig.topper: Pre-commit the new tests so we can see the change?
; CHECK-LABEL: sve_splat_8xi8_imm:		; CHECK-LABEL: sve_splat_8xi8_imm:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #255		; CHECK-NEXT: mov z0.h, #-1 // =0xffffffffffffffff
; CHECK-NEXT: mov z0.h, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%ins = insertelement <vscale x 8 x i8> undef, i8 -1, i32 0		%ins = insertelement <vscale x 8 x i8> undef, i8 -1, i32 0
%splat = shufflevector <vscale x 8 x i8> %ins, <vscale x 8 x i8> undef, <vscale x 8 x i32> zeroinitializer		%splat = shufflevector <vscale x 8 x i8> %ins, <vscale x 8 x i8> undef, <vscale x 8 x i32> zeroinitializer
ret <vscale x 8 x i8> %splat		ret <vscale x 8 x i8> %splat
}		}

define <vscale x 2 x i16> @sve_splat_2xi16(i16 %val) {		define <vscale x 2 x i16> @sve_splat_2xi16(i16 %val) {
; CHECK-LABEL: sve_splat_2xi16:		; CHECK-LABEL: sve_splat_2xi16:
Show All 14 Lines	; CHECK-NEXT: ret
%ins = insertelement <vscale x 4 x i16> undef, i16 %val, i32 0		%ins = insertelement <vscale x 4 x i16> undef, i16 %val, i32 0
%splat = shufflevector <vscale x 4 x i16> %ins, <vscale x 4 x i16> undef, <vscale x 4 x i32> zeroinitializer		%splat = shufflevector <vscale x 4 x i16> %ins, <vscale x 4 x i16> undef, <vscale x 4 x i32> zeroinitializer
ret <vscale x 4 x i16> %splat		ret <vscale x 4 x i16> %splat
}		}

define <vscale x 4 x i16> @sve_splat_4xi16_imm() {		define <vscale x 4 x i16> @sve_splat_4xi16_imm() {
; CHECK-LABEL: sve_splat_4xi16_imm:		; CHECK-LABEL: sve_splat_4xi16_imm:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #65535		; CHECK-NEXT: mov z0.s, #-1 // =0xffffffffffffffff
; CHECK-NEXT: mov z0.s, w8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%ins = insertelement <vscale x 4 x i16> undef, i16 -1, i32 0		%ins = insertelement <vscale x 4 x i16> undef, i16 -1, i32 0
%splat = shufflevector <vscale x 4 x i16> %ins, <vscale x 4 x i16> undef, <vscale x 4 x i32> zeroinitializer		%splat = shufflevector <vscale x 4 x i16> %ins, <vscale x 4 x i16> undef, <vscale x 4 x i32> zeroinitializer
ret <vscale x 4 x i16> %splat		ret <vscale x 4 x i16> %splat
}		}

define <vscale x 2 x i32> @sve_splat_2xi32(i32 %val) {		define <vscale x 2 x i32> @sve_splat_2xi32(i32 %val) {
; CHECK-LABEL: sve_splat_2xi32:		; CHECK-LABEL: sve_splat_2xi32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0		; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
; CHECK-NEXT: mov z0.d, x0		; CHECK-NEXT: mov z0.d, x0
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%ins = insertelement <vscale x 2 x i32> undef, i32 %val, i32 0		%ins = insertelement <vscale x 2 x i32> undef, i32 %val, i32 0
%splat = shufflevector <vscale x 2 x i32> %ins, <vscale x 2 x i32> undef, <vscale x 2 x i32> zeroinitializer		%splat = shufflevector <vscale x 2 x i32> %ins, <vscale x 2 x i32> undef, <vscale x 2 x i32> zeroinitializer
ret <vscale x 2 x i32> %splat		ret <vscale x 2 x i32> %splat
}		}

define <vscale x 2 x i32> @sve_splat_2xi32_imm() {		define <vscale x 2 x i32> @sve_splat_2xi32_imm() {
; CHECK-LABEL: sve_splat_2xi32_imm:		; CHECK-LABEL: sve_splat_2xi32_imm:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov w8, #-1		; CHECK-NEXT: mov z0.d, #-1 // =0xffffffffffffffff
; CHECK-NEXT: mov z0.d, x8
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%ins = insertelement <vscale x 2 x i32> undef, i32 -1, i32 0		%ins = insertelement <vscale x 2 x i32> undef, i32 -1, i32 0
%splat = shufflevector <vscale x 2 x i32> %ins, <vscale x 2 x i32> undef, <vscale x 2 x i32> zeroinitializer		%splat = shufflevector <vscale x 2 x i32> %ins, <vscale x 2 x i32> undef, <vscale x 2 x i32> zeroinitializer
ret <vscale x 2 x i32> %splat		ret <vscale x 2 x i32> %splat
}		}

;; Widen/split splats of wide vector types.		;; Widen/split splats of wide vector types.

▲ Show 20 Lines • Show All 359 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll

	Show All 23 Lines

	; ============================================================================ ;			; ============================================================================ ;
	; 16-bit vector width			; 16-bit vector width
	; ============================================================================ ;			; ============================================================================ ;

	define <2 x i8> @out_v2i8(<2 x i8> %x, <2 x i8> %y, <2 x i8> %mask) nounwind {			define <2 x i8> @out_v2i8(<2 x i8> %x, <2 x i8> %y, <2 x i8> %mask) nounwind {
	; CHECK-LABEL: out_v2i8:			; CHECK-LABEL: out_v2i8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: movi d3, #0x0000ff000000ff			; CHECK-NEXT: bif v0.8b, v1.8b, v2.8b
	; CHECK-NEXT: and v0.8b, v0.8b, v2.8b
	; CHECK-NEXT: eor v2.8b, v2.8b, v3.8b
	; CHECK-NEXT: and v1.8b, v1.8b, v2.8b
	; CHECK-NEXT: orr v0.8b, v0.8b, v1.8b
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%mx = and <2 x i8> %x, %mask			%mx = and <2 x i8> %x, %mask
	%notmask = xor <2 x i8> %mask, <i8 -1, i8 -1>			%notmask = xor <2 x i8> %mask, <i8 -1, i8 -1>
	%my = and <2 x i8> %y, %notmask			%my = and <2 x i8> %y, %notmask
	%r = or <2 x i8> %mx, %my			%r = or <2 x i8> %mx, %my
	ret <2 x i8> %r			ret <2 x i8> %r
	}			}

	Show All 11 Lines

	; ============================================================================ ;			; ============================================================================ ;
	; 32-bit vector width			; 32-bit vector width
	; ============================================================================ ;			; ============================================================================ ;

	define <4 x i8> @out_v4i8(<4 x i8> %x, <4 x i8> %y, <4 x i8> %mask) nounwind {			define <4 x i8> @out_v4i8(<4 x i8> %x, <4 x i8> %y, <4 x i8> %mask) nounwind {
	; CHECK-LABEL: out_v4i8:			; CHECK-LABEL: out_v4i8:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: movi d3, #0xff00ff00ff00ff			; CHECK-NEXT: bif v0.8b, v1.8b, v2.8b
	; CHECK-NEXT: and v0.8b, v0.8b, v2.8b
	; CHECK-NEXT: eor v2.8b, v2.8b, v3.8b
	; CHECK-NEXT: and v1.8b, v1.8b, v2.8b
	; CHECK-NEXT: orr v0.8b, v0.8b, v1.8b
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%mx = and <4 x i8> %x, %mask			%mx = and <4 x i8> %x, %mask
	%notmask = xor <4 x i8> %mask, <i8 -1, i8 -1, i8 -1, i8 -1>			%notmask = xor <4 x i8> %mask, <i8 -1, i8 -1, i8 -1, i8 -1>
	%my = and <4 x i8> %y, %notmask			%my = and <4 x i8> %y, %notmask
	%r = or <4 x i8> %mx, %my			%r = or <4 x i8> %mx, %my
	ret <4 x i8> %r			ret <4 x i8> %r
	}			}

	define <4 x i8> @out_v4i8_undef(<4 x i8> %x, <4 x i8> %y, <4 x i8> %mask) nounwind {			define <4 x i8> @out_v4i8_undef(<4 x i8> %x, <4 x i8> %y, <4 x i8> %mask) nounwind {
	; CHECK-LABEL: out_v4i8_undef:			; CHECK-LABEL: out_v4i8_undef:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: movi d3, #0xff00ff00ff00ff			; CHECK-NEXT: bif v0.8b, v1.8b, v2.8b
	; CHECK-NEXT: and v0.8b, v0.8b, v2.8b
	; CHECK-NEXT: eor v2.8b, v2.8b, v3.8b
	; CHECK-NEXT: and v1.8b, v1.8b, v2.8b
	; CHECK-NEXT: orr v0.8b, v0.8b, v1.8b
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%mx = and <4 x i8> %x, %mask			%mx = and <4 x i8> %x, %mask
	%notmask = xor <4 x i8> %mask, <i8 -1, i8 -1, i8 undef, i8 -1>			%notmask = xor <4 x i8> %mask, <i8 -1, i8 -1, i8 undef, i8 -1>
	%my = and <4 x i8> %y, %notmask			%my = and <4 x i8> %y, %notmask
	%r = or <4 x i8> %mx, %my			%r = or <4 x i8> %mx, %my
	ret <4 x i8> %r			ret <4 x i8> %r
	}			}

	define <2 x i16> @out_v2i16(<2 x i16> %x, <2 x i16> %y, <2 x i16> %mask) nounwind {			define <2 x i16> @out_v2i16(<2 x i16> %x, <2 x i16> %y, <2 x i16> %mask) nounwind {
	; CHECK-LABEL: out_v2i16:			; CHECK-LABEL: out_v2i16:
	; CHECK: // %bb.0:			; CHECK: // %bb.0:
	; CHECK-NEXT: movi d3, #0x00ffff0000ffff			; CHECK-NEXT: bif v0.8b, v1.8b, v2.8b
	; CHECK-NEXT: and v0.8b, v0.8b, v2.8b
	; CHECK-NEXT: eor v2.8b, v2.8b, v3.8b
	; CHECK-NEXT: and v1.8b, v1.8b, v2.8b
	; CHECK-NEXT: orr v0.8b, v0.8b, v1.8b
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	%mx = and <2 x i16> %x, %mask			%mx = and <2 x i16> %x, %mask
	%notmask = xor <2 x i16> %mask, <i16 -1, i16 -1>			%notmask = xor <2 x i16> %mask, <i16 -1, i16 -1>
	%my = and <2 x i16> %y, %notmask			%my = and <2 x i16> %y, %notmask
	%r = or <2 x i16> %mx, %my			%r = or <2 x i16> %mx, %my
	ret <2 x i16> %r			ret <2 x i16> %r
	}			}

	▲ Show 20 Lines • Show All 317 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/vecreduce-and-legalization.ll

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%b = call i128 @llvm.vector.reduce.and.v1i128(<1 x i128> %a)		%b = call i128 @llvm.vector.reduce.and.v1i128(<1 x i128> %a)
ret i128 %b		ret i128 %b
}		}

define i8 @test_v3i8(<3 x i8> %a) nounwind {		define i8 @test_v3i8(<3 x i8> %a) nounwind {
; CHECK-LABEL: test_v3i8:		; CHECK-LABEL: test_v3i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: and w8, w0, w1		; CHECK-NEXT: and w8, w0, w1
; CHECK-NEXT: and w8, w8, w2		; CHECK-NEXT: and w0, w8, w2
; CHECK-NEXT: and w0, w8, #0xff
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%b = call i8 @llvm.vector.reduce.and.v3i8(<3 x i8> %a)		%b = call i8 @llvm.vector.reduce.and.v3i8(<3 x i8> %a)
ret i8 %b		ret i8 %b
}		}

define i8 @test_v9i8(<9 x i8> %a) nounwind {		define i8 @test_v9i8(<9 x i8> %a) nounwind {
; CHECK-LABEL: test_v9i8:		; CHECK-LABEL: test_v9i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constantsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 399589

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/RISCV/RISCVISelLowering.h

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/test/CodeGen/AArch64/arm64-vshuffle.ll

llvm/test/CodeGen/AArch64/arm64_32-atomics.ll

llvm/test/CodeGen/AArch64/cmpxchg-idioms.ll

llvm/test/CodeGen/AArch64/dag-numsignbits.ll

llvm/test/CodeGen/AArch64/fast-isel-cmp-vec.ll

llvm/test/CodeGen/AArch64/funnel-shift.ll

llvm/test/CodeGen/AArch64/reduce-and.ll

llvm/test/CodeGen/AArch64/redundant-copy-elim-empty-mbb.ll

llvm/test/CodeGen/AArch64/statepoint-call-lowering.ll

llvm/test/CodeGen/AArch64/sve-vector-splat.ll

llvm/test/CodeGen/AArch64/unfold-masked-merge-vector-variablemask.ll

llvm/test/CodeGen/AArch64/vecreduce-and-legalization.ll

[CodeGen][AArch64] Ensure isSExtCheaperThanZExt returns true for negative constants
ClosedPublic