This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/lib/Basic/Targets/
-
lib/
-
Basic/
-
Targets/
-
RISCV.h
-
llvm/
-
lib/
-
IR/
1/2
AutoUpgrade.cpp
-
Target/RISCV/
-
RISCV/
-
RISCVTargetMachine.cpp
-
test/CodeGen/RISCV/
-
CodeGen/
-
RISCV/
-
aext-to-sext.ll
-
rvv/
2/4
fixed-vector-strided-load-store.ll
-
fixed-vectors-fp-buildvec.ll
-
unittests/Bitcode/
-
Bitcode/
1
DataLayoutUpgradeTest.cpp

Differential D116735

[RISCV] Adjust RV64I data layout by using n32:64 in layout string
ClosedPublic

Authored by craig.topper on Jan 6 2022, 3:52 AM.

Download Raw Diff

Details

Reviewers

asb
frasercrmck
mgrang
joshua-arch1

Commits

rG974e2e690b40: [RISCV] Adjust RV64I data layout by using n32:64 in layout string

Summary

Although i32 type is illegal in the backend, RV64I has pretty good support for i32 types by using W instructions.

By adding n32 to the DataLayout string, middle end optimizations will consider i32 to be a native type. One known effect of this is enabling LoopStrengthReduce on loops with i32 induction variables. This can be beneficial because C/C++ code often has loops with i32 induction variables due to the use of int or unsigned int.

If this patch exposes performance issues, those are better addressed by tuning LSR or other passes.

Diff Detail

Unit TestsFailed

	Time	Test
	60,110 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vloxseg_mask.c
	60,110 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vloxseg_mask_mf.c
	60,110 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vlseg_mask.c
	60,130 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vlsegff_mask.c
	60,120 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vluxseg_mask.c
		View Full Test Results (24 Failed)

Event Timeline

joshua-arch1 created this revision.Jan 6 2022, 3:52 AM

Herald added subscribers: VincentWu, luke957, achieveartificialintelligence and 25 others. · View Herald TranscriptJan 6 2022, 3:52 AM

joshua-arch1 requested review of this revision.Jan 6 2022, 3:52 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 6 2022, 3:52 AM

Herald added subscribers: llvm-commits, MaskRay. · View Herald Transcript

joshua-arch1 edited the summary of this revision. (Show Details)Jan 6 2022, 3:53 AM

Harbormaster completed remote builds in B141871: Diff 397839.Jan 6 2022, 4:23 AM

Thanks - I don't think much thought has gone into whether n32:n64 would be reasonable for RV64, so I really appreciate you digging in to it.

I've only run this against the GCC torture suite, and on average this seems to be about neutral (or mildly positive). There are a few cases where code is worse though. Here's a rather surprising example (I haven't investigated any further):

--- a/output_rv64imafdc_lp64_O1/930921-1.s
+++ b/output_rv64imafdc_lp64_O1/930921-1.s
@@ -24,18 +24,29 @@ main:                                   # @main
 # %bb.0:                                # %entry
        addi    sp, sp, -16
        sd      ra, 8(sp)                       # 8-byte Folded Spill
+       li      a0, 0
        li      a1, 0
-       li      a0, 1
-       lui     a2, 2
-       addiw   a2, a2, 1806
+       lui     a2, 699051
+       addiw   a2, a2, -1365
+       slli    a6, a2, 32
+       lui     a3, 171
+       addiw   a3, a3, -1365
+       slli    a3, a3, 12
+       addi    a3, a3, -1365
+       lui     a4, 2
+       addiw   a4, a4, 1808
 .LBB1_1:                                # %for.body
                                         # =>This Inner Loop Header: Depth=1
-       beqz    a0, .LBB1_4
+       srli    a5, a0, 33
+       slli    a2, a1, 32
+       mulhu   a2, a2, a6
+       srli    a2, a2, 33
+       bne     a2, a5, .LBB1_4
 # %bb.2:                                # %for.cond
                                         #   in Loop: Header=BB1_1 Depth=1
-       sext.w  a3, a1
        addiw   a1, a1, 1
-       bgeu    a2, a3, .LBB1_1
+       add     a0, a0, a3
+       bne     a1, a4, .LBB1_1
 # %bb.3:                                # %for.end
        li      a0, 0
        call    exit

That's trying to do return (unsigned) (((unsigned long long) x * 0xAAAAAAAB) >> 32) >> 1; for a range of x. Presumably the issue is MULW exists but not MULWHU and friends.

The DataLayout documentation says:

This specifies a set of native integer widths for the target CPU in bits. For example, it might contain n32 for 32-bit PowerPC, n32:64 for PowerPC 64, or n8:16:32:64 for X86-64. Elements of this set are considered to support most general arithmetic operations efficiently.

It's unclear what exactly that means for RISC-V.

Having said that, for the IVUsers example pointed out, that's just bad code in LLVM, it should permit things that are smaller than native integers, I would imagine. Otherwise, say, i16, won't be used on most targets (other than X86 and a few experimental, embedded or graphics-y targets) as they'll only have n32:64, but the comment there is clearly to just stop it creating overly-large integer types.

I believe there are two checks for DataLayout.isLegalInteger in IndVarSimplify that are interesting as well.

Possibly also SROA's, though probably unlikely to matter much as it'd be such a tiny aggregate

This will probably require something in AutoUpgrade.cpp to fix up old IR files. I think maybe X86 does an upgrade for some address space additions.

In D116735#3225286, @jrtc27 wrote:

That's trying to do return (unsigned) (((unsigned long long) x * 0xAAAAAAAB) >> 32) >> 1; for a range of x. Presumably the issue is MULW exists but not MULWHU and friends.

So the f function gets inlined into this compare f (i) != i / 3. The i/3 gets optimized by SelectionDAG into (unsigned) (((unsigned long long) x * 0xAAAAAAAB) >> 32) >> 1. Before this patch, the automatic CSE in SelectionDAG merges this with the code from the inlined function. This makes a the comparison be statically always true so all the code gets optimized away.

With this patch, LSR kicks in and rewrites some of the inlined code from f by removing the multiply with 0xAAAAAAAB in favor of a modified induction variable that increments by 0xAAAAAAAB on each iteration. Now there's nothing for the div by constant optimization to CSE with so the comparison doesn't become statically always true.

I don't think this test is a blocker for this patch.

joshua-arch1 updated this revision to Diff 398029.Jan 6 2022, 6:47 PM

Harbormaster completed remote builds in B142000: Diff 398029.Jan 6 2022, 7:10 PM

Cookfeces added a subscriber: Cookfeces.Jan 6 2022, 10:02 PM

zixuan-wu added a subscriber: zixuan-wu.Jan 7 2022, 1:07 AM

I have checked all the benchmarks.
For coremark, spec2006_int-471.omnetpp and eembc_networking_pktflow, the performance can improve by more than 3%, without degression of other cases.

craig.topper added a reviewer: craig.topper.Jan 10 2022, 11:29 AM

In D116735#3230281, @joshua-arch1 wrote:

I have checked all the benchmarks.
For coremark, spec2006_int-471.omnetpp and eembc_networking_pktflow, the performance can improve by more than 3%, without degression of other cases.

Can you say more about what hardware and ISA extensions were used for that testing?

In D116735#3238854, @craig.topper wrote:

In D116735#3230281, @joshua-arch1 wrote:

I have checked all the benchmarks.
For coremark, spec2006_int-471.omnetpp and eembc_networking_pktflow, the performance can improve by more than 3%, without degression of other cases.

Can you say more about what hardware and ISA extensions were used for that testing?

I used T-HEAD XuanTie C906 processor based on the RV64GCV instruction set and THEAD instruction extension.

Herald added a subscriber: alextsao1999. · View Herald TranscriptJan 16 2022, 7:31 PM

For the following benchmark, the performance can improve without degression of other cases (on T-HEAD XuanTie C906 processor with THEAD instruction extension)

spec2006_int-471.omnetpp	+18.9672%
spec2006_int-483.xalancbmk	+3.5192%
coremark	+3.8695%
eembc_networking_pktflowb1m	+4.1295%
eembc_networking_pktflowb2m	+4.5903%
eembc_networking_pktflowb4m	+4.0367%
eembc_networking_pktflowb512k	+5.2104%

Can you test on your hardware?

Herald added subscribers: • pcwang-thead, eopXD. · View Herald TranscriptJan 25 2022, 11:02 PM

Ping.

In D116735#3247378, @joshua-arch1 wrote:

In D116735#3238854, @craig.topper wrote:

In D116735#3230281, @joshua-arch1 wrote:

I have checked all the benchmarks.
For coremark, spec2006_int-471.omnetpp and eembc_networking_pktflow, the performance can improve by more than 3%, without degression of other cases.

Can you say more about what hardware and ISA extensions were used for that testing?

I used T-HEAD XuanTie C906 processor based on the RV64GCV instruction set and THEAD instruction extension.

Was it tested using a version of LLVM that supports the THEAD instruction extension or just the public LLVM?

In D116735#3225715, @craig.topper wrote:

In D116735#3225286, @jrtc27 wrote:

That's trying to do return (unsigned) (((unsigned long long) x * 0xAAAAAAAB) >> 32) >> 1; for a range of x. Presumably the issue is MULW exists but not MULWHU and friends.

So the f function gets inlined into this compare f (i) != i / 3. The i/3 gets optimized by SelectionDAG into (unsigned) (((unsigned long long) x * 0xAAAAAAAB) >> 32) >> 1. Before this patch, the automatic CSE in SelectionDAG merges this with the code from the inlined function. This makes a the comparison be statically always true so all the code gets optimized away.

With this patch, LSR kicks in and rewrites some of the inlined code from f by removing the multiply with 0xAAAAAAAB in favor of a modified induction variable that increments by 0xAAAAAAAB on each iteration. Now there's nothing for the div by constant optimization to CSE with so the comparison doesn't become statically always true.

I don't think this test is a blocker for this patch.

Does the patch improve performance absolutely in theory?
Could anybody else also test to see if the patch can improve performance because guys may have different environment or benchmarks?

I think I'm correct in thinking that everyone so far is broadly in favour of this change, but as Craig suggests it likely wants something in AutoUpgrade.cpp to handle the change.

Does anyone feel differently?

joshua-arch1 updated this revision to Diff 411007.Feb 23 2022, 10:39 PM

Herald added a subscriber: dexonsmith. · View Herald TranscriptFeb 23 2022, 10:39 PM

joshua-arch1 updated this revision to Diff 411008.Feb 23 2022, 10:45 PM

Ping.

Harbormaster completed remote builds in B151192: Diff 411008.Feb 23 2022, 11:21 PM

In D116735#3326135, @asb wrote:

I think I'm correct in thinking that everyone so far is broadly in favour of this change, but as Craig suggests it likely wants something in AutoUpgrade.cpp to handle the change.

Does anyone feel differently?

I have updated this patch by doing an upgrade for RISCV in AutoUpgrade.cpp to fix up old IR files. Could anyone review again?

jrtc27 added inline comments.Feb 27 2022, 7:43 PM

llvm/lib/IR/AutoUpgrade.cpp
4583	Don't hard-code the data layout, replace the -n64- with -n32:64-, otherwise it's a pain for downstreams like us who change the data layout depending on both the ISA and the ABI. All the other examples in this function (which aren't visible here because you uploaded a diff without context... don't do that) avoid hard-coding data layouts so you have plenty of different examples to copy from.

Reverse ping.

Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2022, 4:06 PM

Herald added subscribers: sunshaoce, StephenFan, arichardson. · View Herald Transcript

In D116735#3417888, @craig.topper wrote:

Reverse ping.

Sorry for the delay. The patch has been updated by replacing the '-n64-' with '-n32:64-' for RISCV in AutoUpgrade.cpp.

joshua-arch1 updated this revision to Diff 419645.Apr 1 2022, 12:00 AM

Harbormaster completed remote builds in B157329: Diff 419645.Apr 1 2022, 1:02 AM

joshua-arch1 updated this revision to Diff 419676.Apr 1 2022, 2:04 AM

Harbormaster completed remote builds in B157355: Diff 419676.Apr 1 2022, 3:08 AM

jrtc27 added inline comments.Apr 1 2022, 4:00 AM

llvm/lib/IR/AutoUpgrade.cpp
4601	Comment?
llvm/test/CodeGen/RISCV/rvv/fixed-vector-strided-load-store.ll
1077	This is a regression
llvm/unittests/Bitcode/DataLayoutUpgradeTest.cpp
35	Comment like AMDGPU?

joshua-arch1 updated this revision to Diff 419712.Apr 1 2022, 4:22 AM

Harbormaster completed remote builds in B157384: Diff 419712.Apr 1 2022, 5:11 AM

craig.topper added inline comments.Apr 1 2022, 8:33 AM

llvm/test/CodeGen/RISCV/rvv/fixed-vector-strided-load-store.ll
1081	This move is interesting. I'll look closer at that.

craig.topper added inline comments.Apr 1 2022, 1:13 PM

llvm/test/CodeGen/RISCV/rvv/fixed-vector-strided-load-store.ll
1077	This does save an add inside the loop.
1081	D122933 should fix the mv here.

I'm seeing a regression on 401.bzip2 and possibly 471.astar. And I'm not seeing large improvements on 471.omnetpp or 483.xalancbmk.

eopXD mentioned this in D123458: [LSR][RISCV] Improve test coverage for LSR in RISC-V.Apr 9 2022, 11:58 AM

craig.topper commandeered this revision.Oct 4 2022, 1:40 PM

craig.topper edited reviewers, added: joshua-arch1; removed: craig.topper.

Herald added a subscriber: shiva0217. · View Herald TranscriptOct 4 2022, 1:40 PM

Rebase.

Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2022, 2:04 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Use TT.isRISCV64() in AutoUpgrade.cpp

Add requested comment to test.

dexonsmith removed a subscriber: dexonsmith.Oct 4 2022, 3:01 PM

Harbormaster completed remote builds in B190316: Diff 465165.Oct 4 2022, 3:55 PM

Ping

In D116735#3429850, @craig.topper wrote:

I'm seeing a regression on 401.bzip2 and possibly 471.astar. And I'm not seeing large improvements on 471.omnetpp or 483.xalancbmk.

LGTM. But do you still get regression on spec? Or any improvements?

In D116735#3857705, @zixuan-wu wrote:

In D116735#3429850, @craig.topper wrote:

I'm seeing a regression on 401.bzip2 and possibly 471.astar. And I'm not seeing large improvements on 471.omnetpp or 483.xalancbmk.

LGTM. But do you still get regression on spec? Or any improvements?

I only have numbers from my downstream repo which has a bunch of other changes. For one of our CPUs, I'm now seeing improvements, and another looks neutral. So I think we should move forward this. If there are individual regressions we can work on fixing those by improving the backend or fixing cost models or whatever.

Just want to note that I have no strong opinion on this patch. It doesn't seem unreasonable, and I'm comfortable with this being an empirically driven decision.

I agree with @reames, though I do think the patch description could use a rewrite.

I've put it on the agenda for the call today so we can close this off (I figure given it's been sitting so long, waiting a few extra days to cover it in the call is no big deal). But if the data shows this overall improves things, I think it's a sensible change to make.

There were no objections on the call. Looks good to me - two minor changes before landing that would make sense:

Fraser suggested tweaking the patch description
Probably worth adding this to docs/ReleaseNotes.rst (I know we're not very consistent about doing this, but adding it now minimises the risk I miss it later!).

This revision is now accepted and ready to land.Oct 27 2022, 8:54 AM

Add to release notes

craig.topper retitled this revision from [RISCV] Adjust RISCV data layout by using n32:64 in layout string to [RISCV] Adjust RV64I data layout by using n32:64 in layout string.Oct 27 2022, 10:16 AM

craig.topper edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B194696: Diff 471209.Oct 27 2022, 12:11 PM

arichardson added inline comments.Oct 27 2022, 4:52 PM

llvm/docs/ReleaseNotes.rst
122 ↗	(On Diff #471209)	Without additional context I don't think this makes much sense to most readers. Before looking at this patch description I would not have been and to say what n is used for. Maybe something like "i32 has been marked as a legal integer type for RV64, improving code generation for some benchmarks"?

jrtc27 added inline comments.Oct 27 2022, 5:14 PM

llvm/docs/ReleaseNotes.rst
122 ↗	(On Diff #471209)	"native integer type" not "legal integer type"; it still gets legalised during ISel

Refine ReleaseNotes

Harbormaster completed remote builds in B194823: Diff 471385.Oct 27 2022, 9:22 PM

LGTM other than a nit, but I concur that a comment in AutoUpgrade would be nice.

llvm/docs/ReleaseNotes.rst
123 ↗	(On Diff #471385)	something like `, among other optimizations` to avoid full stop + "and"?

Closed by commit rG974e2e690b40: [RISCV] Adjust RV64I data layout by using n32:64 in layout string (authored by craig.topper). · Explain WhyOct 28 2022, 8:27 AM

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rG974e2e690b40: [RISCV] Adjust RV64I data layout by using n32:64 in layout string.

Revision Contents

Path

Size

clang/

lib/

Basic/

Targets/

RISCV.h

2 lines

llvm/

lib/

IR/

AutoUpgrade.cpp

7 lines

Target/

RISCV/

RISCVTargetMachine.cpp

2 lines

test/

CodeGen/

RISCV/

aext-to-sext.ll

21 lines

rvv/

fixed-vector-strided-load-store.ll

25 lines

fixed-vectors-fp-buildvec.ll

12 lines

unittests/

Bitcode/

DataLayoutUpgradeTest.cpp

4 lines

Diff 419712

clang/lib/Basic/Targets/RISCV.h

Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	public:
}		}
};		};
class LLVM_LIBRARY_VISIBILITY RISCV64TargetInfo : public RISCVTargetInfo {		class LLVM_LIBRARY_VISIBILITY RISCV64TargetInfo : public RISCVTargetInfo {
public:		public:
RISCV64TargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts)		RISCV64TargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts)
: RISCVTargetInfo(Triple, Opts) {		: RISCVTargetInfo(Triple, Opts) {
LongWidth = LongAlign = PointerWidth = PointerAlign = 64;		LongWidth = LongAlign = PointerWidth = PointerAlign = 64;
IntMaxType = Int64Type = SignedLong;		IntMaxType = Int64Type = SignedLong;
resetDataLayout("e-m:e-p:64:64-i64:64-i128:128-n64-S128");		resetDataLayout("e-m:e-p:64:64-i64:64-i128:128-n32:64-S128");
}		}

bool setABI(const std::string &Name) override {		bool setABI(const std::string &Name) override {
if (Name == "lp64" \|\| Name == "lp64f" \|\| Name == "lp64d") {		if (Name == "lp64" \|\| Name == "lp64f" \|\| Name == "lp64d") {
ABI = Name;		ABI = Name;
return true;		return true;
}		}
return false;		return false;
Show All 18 Lines

llvm/lib/IR/AutoUpgrade.cpp

	Show First 20 Lines • Show All 4,574 Lines • ▼ Show 20 Lines
	}			}

	MDNode *llvm::upgradeInstructionLoopAttachment(MDNode &N) {			MDNode *llvm::upgradeInstructionLoopAttachment(MDNode &N) {
	auto *T = dyn_cast<MDTuple>(&N);			auto *T = dyn_cast<MDTuple>(&N);
	if (!T)			if (!T)
	return &N;			return &N;

	if (none_of(T->operands(), isOldLoopArgument))			if (none_of(T->operands(), isOldLoopArgument))
	return &N;			return &N;
				jrtc27Unsubmitted Done Reply Inline Actions Don't hard-code the data layout, replace the -n64- with -n32:64-, otherwise it's a pain for downstreams like us who change the data layout depending on both the ISA and the ABI. All the other examples in this function (which aren't visible here because you uploaded a diff without context... don't do that) avoid hard-coding data layouts so you have plenty of different examples to copy from. jrtc27: Don't hard-code the data layout, replace the -n64- with -n32:64-, otherwise it's a pain for…

	SmallVector<Metadata *, 8> Ops;			SmallVector<Metadata *, 8> Ops;
	Ops.reserve(T->getNumOperands());			Ops.reserve(T->getNumOperands());
	for (Metadata *MD : T->operands())			for (Metadata *MD : T->operands())
	Ops.push_back(upgradeLoopArgument(MD));			Ops.push_back(upgradeLoopArgument(MD));

	return MDTuple::get(T->getContext(), Ops);			return MDTuple::get(T->getContext(), Ops);
	}			}

	std::string llvm::UpgradeDataLayoutString(StringRef DL, StringRef TT) {			std::string llvm::UpgradeDataLayoutString(StringRef DL, StringRef TT) {
	Triple T(TT);			Triple T(TT);
	// For AMDGPU we uprgrade older DataLayouts to include the default globals			// For AMDGPU we uprgrade older DataLayouts to include the default globals
	// address space of 1.			// address space of 1.
	if (T.isAMDGPU() && !DL.contains("-G") && !DL.startswith("G")) {			if (T.isAMDGPU() && !DL.contains("-G") && !DL.startswith("G")) {
	return DL.empty() ? std::string("G1") : (DL + "-G1").str();			return DL.empty() ? std::string("G1") : (DL + "-G1").str();
	}			}

				if (T.isRISCV() && T.isArch64Bit()) {
				jrtc27Unsubmitted Not Done Reply Inline Actions Comment? jrtc27: Comment?
				auto I = DL.find("-n64-");
				if (I != StringRef::npos)
				return (DL.take_front(I) + "-n32:64-" + DL.drop_front(I + 5)).str();
				return DL.str();
				}

	std::string Res = DL.str();			std::string Res = DL.str();
	if (!T.isX86())			if (!T.isX86())
	return Res;			return Res;

	// If the datalayout matches the expected format, add pointer size address			// If the datalayout matches the expected format, add pointer size address
	// spaces to the datalayout.			// spaces to the datalayout.
	std::string AddrSpaces = "-p270:32:32-p271:32:32-p272:64:64";			std::string AddrSpaces = "-p270:32:32-p271:32:32-p272:64:64";
	if (!DL.contains(AddrSpaces)) {			if (!DL.contains(AddrSpaces)) {
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetMachine.cpp

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeRISCVTarget() {
initializeRISCVSExtWRemovalPass(*PR);		initializeRISCVSExtWRemovalPass(*PR);
initializeRISCVExpandPseudoPass(*PR);		initializeRISCVExpandPseudoPass(*PR);
initializeRISCVInsertVSETVLIPass(*PR);		initializeRISCVInsertVSETVLIPass(*PR);
initializeRISCVDeleteVSETVLPass(*PR);		initializeRISCVDeleteVSETVLPass(*PR);
}		}

static StringRef computeDataLayout(const Triple &TT) {		static StringRef computeDataLayout(const Triple &TT) {
if (TT.isArch64Bit())		if (TT.isArch64Bit())
return "e-m:e-p:64:64-i64:64-i128:128-n64-S128";		return "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128";
assert(TT.isArch32Bit() && "only RV32 and RV64 are currently supported");		assert(TT.isArch32Bit() && "only RV32 and RV64 are currently supported");
return "e-m:e-p:32:32-i64:64-n32-S128";		return "e-m:e-p:32:32-i64:64-n32-S128";
}		}

static Reloc::Model getEffectiveRelocModel(const Triple &TT,		static Reloc::Model getEffectiveRelocModel(const Triple &TT,
Optional<Reloc::Model> RM) {		Optional<Reloc::Model> RM) {
if (!RM.hasValue())		if (!RM.hasValue())
return Reloc::Static;		return Reloc::Static;
▲ Show 20 Lines • Show All 166 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/aext-to-sext.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \			; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
	; RUN: \| FileCheck %s -check-prefix=RV64I			; RUN: \| FileCheck %s -check-prefix=RV64I

	; Make sure we don't generate an addi in the loop in			; Make sure we don't generate an addi in the loop in
	; addition to the addiw. Previously we type legalize the			; addition to the addiw. Previously we type legalize the
	; setcc use using signext and the phi use using anyext.			; setcc use using signext and the phi use using anyext.
	; We now detect when it would be beneficial to replace			; We now detect when it would be beneficial to replace
	; anyext with signext.			; anyext with signext.

	define void @quux(i32 signext %arg, i32 signext %arg1) nounwind {			define void @quux(i32 signext %arg, i32 signext %arg1) nounwind {
	; RV64I-LABEL: quux:			; RV64I-LABEL: quux:
	; RV64I: # %bb.0: # %bb			; RV64I: # %bb.0: # %bb
	; RV64I-NEXT: addi sp, sp, -32			; RV64I-NEXT: addi sp, sp, -16
	; RV64I-NEXT: sd ra, 24(sp) # 8-byte Folded Spill			; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
	; RV64I-NEXT: sd s0, 16(sp) # 8-byte Folded Spill			; RV64I-NEXT: sd s0, 0(sp) # 8-byte Folded Spill
	; RV64I-NEXT: sd s1, 8(sp) # 8-byte Folded Spill
	; RV64I-NEXT: beq a0, a1, .LBB0_3			; RV64I-NEXT: beq a0, a1, .LBB0_3
	; RV64I-NEXT: # %bb.1: # %bb2.preheader			; RV64I-NEXT: # %bb.1: # %bb2.preheader
	; RV64I-NEXT: mv s0, a1			; RV64I-NEXT: subw s0, a1, a0
	; RV64I-NEXT: mv s1, a0
	; RV64I-NEXT: .LBB0_2: # %bb2			; RV64I-NEXT: .LBB0_2: # %bb2
	; RV64I-NEXT: # =>This Inner Loop Header: Depth=1			; RV64I-NEXT: # =>This Inner Loop Header: Depth=1
	; RV64I-NEXT: call hoge@plt			; RV64I-NEXT: call hoge@plt
	; RV64I-NEXT: addiw s1, s1, 1			; RV64I-NEXT: addiw s0, s0, -1
	; RV64I-NEXT: bne s1, s0, .LBB0_2			; RV64I-NEXT: bnez s0, .LBB0_2
	; RV64I-NEXT: .LBB0_3: # %bb6			; RV64I-NEXT: .LBB0_3: # %bb6
	; RV64I-NEXT: ld ra, 24(sp) # 8-byte Folded Reload			; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
	; RV64I-NEXT: ld s0, 16(sp) # 8-byte Folded Reload			; RV64I-NEXT: ld s0, 0(sp) # 8-byte Folded Reload
	; RV64I-NEXT: ld s1, 8(sp) # 8-byte Folded Reload			; RV64I-NEXT: addi sp, sp, 16
	; RV64I-NEXT: addi sp, sp, 32
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	bb:			bb:
	%tmp = icmp eq i32 %arg, %arg1			%tmp = icmp eq i32 %arg, %arg1
	br i1 %tmp, label %bb6, label %bb2			br i1 %tmp, label %bb6, label %bb2

	bb2: ; preds = %bb2, %bb			bb2: ; preds = %bb2, %bb
	%tmp3 = phi i32 [ %tmp4, %bb2 ], [ %arg, %bb ]			%tmp3 = phi i32 [ %tmp4, %bb2 ], [ %arg, %bb ]
	tail call void @hoge()			tail call void @hoge()
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/fixed-vector-strided-load-store.ll

	Show First 20 Lines • Show All 1,064 Lines • ▼ Show 20 Lines
	; CHECK-ASM-NEXT: vse8.v v8, (a2)			; CHECK-ASM-NEXT: vse8.v v8, (a2)
	; CHECK-ASM-NEXT: addi t1, t1, -32			; CHECK-ASM-NEXT: addi t1, t1, -32
	; CHECK-ASM-NEXT: addi a2, a2, 32			; CHECK-ASM-NEXT: addi a2, a2, 32
	; CHECK-ASM-NEXT: addi a6, a6, 160			; CHECK-ASM-NEXT: addi a6, a6, 160
	; CHECK-ASM-NEXT: bnez t1, .LBB12_3			; CHECK-ASM-NEXT: bnez t1, .LBB12_3
	; CHECK-ASM-NEXT: # %bb.4:			; CHECK-ASM-NEXT: # %bb.4:
	; CHECK-ASM-NEXT: beq a4, a5, .LBB12_7			; CHECK-ASM-NEXT: beq a4, a5, .LBB12_7
	; CHECK-ASM-NEXT: .LBB12_5:			; CHECK-ASM-NEXT: .LBB12_5:
	; CHECK-ASM-NEXT: slli a2, a3, 2			; CHECK-ASM-NEXT: addiw a2, a3, -1024
	; CHECK-ASM-NEXT: add a2, a2, a3			; CHECK-ASM-NEXT: add a0, a0, a3
	; CHECK-ASM-NEXT: add a1, a1, a2			; CHECK-ASM-NEXT: slli a4, a3, 2
	; CHECK-ASM-NEXT: li a2, 1024			; CHECK-ASM-NEXT: add a3, a4, a3
				; CHECK-ASM-NEXT: add a1, a1, a3
				jrtc27Unsubmitted Done Reply Inline Actions This is a regression jrtc27: This is a regression
				craig.topperAuthorUnsubmitted Done Reply Inline Actions This does save an add inside the loop. craig.topper: This does save an add inside the loop.
	; CHECK-ASM-NEXT: .LBB12_6: # =>This Inner Loop Header: Depth=1			; CHECK-ASM-NEXT: .LBB12_6: # =>This Inner Loop Header: Depth=1
	; CHECK-ASM-NEXT: lb a4, 0(a1)			; CHECK-ASM-NEXT: lb a3, 0(a1)
	; CHECK-ASM-NEXT: add a5, a0, a3			; CHECK-ASM-NEXT: lb a4, 0(a0)
	; CHECK-ASM-NEXT: lb a6, 0(a5)			; CHECK-ASM-NEXT: mv a5, a2
				craig.topperAuthorUnsubmitted Not Done Reply Inline Actions This move is interesting. I'll look closer at that. craig.topper: This move is interesting. I'll look closer at that.
				craig.topperAuthorUnsubmitted Not Done Reply Inline Actions D122933 should fix the mv here. craig.topper: D122933 should fix the mv here.
	; CHECK-ASM-NEXT: addw a4, a6, a4			; CHECK-ASM-NEXT: addw a2, a4, a3
	; CHECK-ASM-NEXT: sb a4, 0(a5)			; CHECK-ASM-NEXT: sb a2, 0(a0)
	; CHECK-ASM-NEXT: addiw a4, a3, 1			; CHECK-ASM-NEXT: addiw a2, a5, 1
	; CHECK-ASM-NEXT: addi a3, a3, 1			; CHECK-ASM-NEXT: addi a0, a0, 1
	; CHECK-ASM-NEXT: addi a1, a1, 5			; CHECK-ASM-NEXT: addi a1, a1, 5
	; CHECK-ASM-NEXT: bne a4, a2, .LBB12_6			; CHECK-ASM-NEXT: bgeu a2, a5, .LBB12_6
	; CHECK-ASM-NEXT: .LBB12_7:			; CHECK-ASM-NEXT: .LBB12_7:
	; CHECK-ASM-NEXT: ret			; CHECK-ASM-NEXT: ret
	%4 = icmp eq i32 %2, 1024			%4 = icmp eq i32 %2, 1024
	br i1 %4, label %36, label %5			br i1 %4, label %36, label %5

	5: ; preds = %3			5: ; preds = %3
	%6 = sext i32 %2 to i64			%6 = sext i32 %2 to i64
	%7 = sub i32 1023, %2			%7 = sub i32 1023, %2
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-fp-buildvec.ll

	Show First 20 Lines • Show All 294 Lines • ▼ Show 20 Lines
	; RV32-NEXT: ret			; RV32-NEXT: ret
	;			;
	; RV64-LABEL: splat_load_licm:			; RV64-LABEL: splat_load_licm:
	; RV64: # %bb.0:			; RV64: # %bb.0:
	; RV64-NEXT: lui a1, %hi(.LCPI12_0)			; RV64-NEXT: lui a1, %hi(.LCPI12_0)
	; RV64-NEXT: addi a1, a1, %lo(.LCPI12_0)			; RV64-NEXT: addi a1, a1, %lo(.LCPI12_0)
	; RV64-NEXT: vsetivli zero, 4, e32, m1, ta, mu			; RV64-NEXT: vsetivli zero, 4, e32, m1, ta, mu
	; RV64-NEXT: vlse32.v v8, (a1), zero			; RV64-NEXT: vlse32.v v8, (a1), zero
	; RV64-NEXT: li a1, 0			; RV64-NEXT: li a1, 1024
	; RV64-NEXT: li a2, 1024
	; RV64-NEXT: .LBB12_1: # =>This Inner Loop Header: Depth=1			; RV64-NEXT: .LBB12_1: # =>This Inner Loop Header: Depth=1
	; RV64-NEXT: slli a3, a1, 2			; RV64-NEXT: vse32.v v8, (a0)
	; RV64-NEXT: add a3, a0, a3			; RV64-NEXT: addiw a1, a1, -4
	; RV64-NEXT: addiw a1, a1, 4			; RV64-NEXT: addi a0, a0, 16
	; RV64-NEXT: vse32.v v8, (a3)			; RV64-NEXT: bnez a1, .LBB12_1
	; RV64-NEXT: bne a1, a2, .LBB12_1
	; RV64-NEXT: # %bb.2:			; RV64-NEXT: # %bb.2:
	; RV64-NEXT: ret			; RV64-NEXT: ret
	br label %2			br label %2

	2: ; preds = %2, %1			2: ; preds = %2, %1
	%3 = phi i32 [ 0, %1 ], [ %6, %2 ]			%3 = phi i32 [ 0, %1 ], [ %6, %2 ]
	%4 = getelementptr inbounds float, float* %0, i32 %3			%4 = getelementptr inbounds float, float* %0, i32 %3
	%5 = bitcast float* %4 to <4 x float>*			%5 = bitcast float* %4 to <4 x float>*
	store <4 x float> <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>, <4 x float>* %5, align 4			store <4 x float> <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>, <4 x float>* %5, align 4
	%6 = add nuw i32 %3, 4			%6 = add nuw i32 %3, 4
	%7 = icmp eq i32 %6, 1024			%7 = icmp eq i32 %6, 1024
	br i1 %7, label %8, label %2			br i1 %7, label %8, label %2

	8: ; preds = %2			8: ; preds = %2
	ret void			ret void
	}			}

llvm/unittests/Bitcode/DataLayoutUpgradeTest.cpp

Show All 25 Lines	TEST(DataLayoutUpgradeTest, ValidDataLayoutUpgrade) {
EXPECT_EQ(DL2, "e-m:w-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64"		EXPECT_EQ(DL2, "e-m:w-p:32:32-p270:32:32-p271:32:32-p272:64:64-i64:64"
"-f80:128-n8:16:32-S32");		"-f80:128-n8:16:32-S32");
EXPECT_EQ(DL3, "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128"		EXPECT_EQ(DL3, "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128"
"-n32:64-S128");		"-n32:64-S128");

// Check that AMDGPU targets add -G1 if it's not present.		// Check that AMDGPU targets add -G1 if it's not present.
EXPECT_EQ(UpgradeDataLayoutString("e-p:32:32", "r600"), "e-p:32:32-G1");		EXPECT_EQ(UpgradeDataLayoutString("e-p:32:32", "r600"), "e-p:32:32-G1");
EXPECT_EQ(UpgradeDataLayoutString("e-p:64:64", "amdgcn"), "e-p:64:64-G1");		EXPECT_EQ(UpgradeDataLayoutString("e-p:64:64", "amdgcn"), "e-p:64:64-G1");

		EXPECT_EQ(UpgradeDataLayoutString("e-m:e-p:64:64-i64:64-i128:128-n64-S128",
		jrtc27Unsubmitted Not Done Reply Inline Actions Comment like AMDGPU? jrtc27: Comment like AMDGPU?
		"riscv64"),
		"e-m:e-p:64:64-i64:64-i128:128-n32:64-S128");
}		}

TEST(DataLayoutUpgradeTest, NoDataLayoutUpgrade) {		TEST(DataLayoutUpgradeTest, NoDataLayoutUpgrade) {
std::string DL1 = UpgradeDataLayoutString(		std::string DL1 = UpgradeDataLayoutString(
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32"		"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32"
"-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"		"-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128"
"-n8:16:32:64-S128",		"-n8:16:32:64-S128",
"x86_64-unknown-linux-gnu");		"x86_64-unknown-linux-gnu");
Show All 33 Lines