This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
-
RISCVFrameLowering.h
25/46
RISCVFrameLowering.cpp
-
test/CodeGen/RISCV/
-
CodeGen/
-
RISCV/
-
reorder-frame-objects.mir

Differential D158623

[RISCV] Reorder the stack frame objects.
Needs ReviewPublic

Authored by lcvon007 on Aug 23 2023, 7:47 AM.

Download Raw Diff

Details

Reviewers

wangpc
craig.topper
reames

Summary

The order of stack frame objects decides the offset size relative to sp/fp, and shorter offset is more possible to make the related instructions to be compressed and use less instructions to build the offset immediate. So it can improve the code size if we reorder the stack objects using proper cost model.

The precise cost model requires further complexity, and the overall gain isn't worth it. I reuse X86's
cost model that uses the estimated density, the cost is computed by
density = ObjectNumUses / ObjectSize,
ObjectNumUses is the number of instructions using the frame object, and the difference between x86 and RISCV is that we provide the double weight for ld/st instructions because it's more possible to be compressed.
ObjectSize is the size of frame object.
CodeSize may regress in some testcases if we don't add weight for ld/st(the reason is that more compressible ld/st get too much offset to stop them being compressed), and the double weight is estimate(other maybe better in some cases).
The main order algorithm is that the frame object with higher density gets shorter offset relative to sp/fp.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lcvon007 created this revision.Aug 23 2023, 7:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 7:47 AM

Herald added subscribers: jobnoorman, luke, sunshaoce and 29 others. · View Herald Transcript

lcvon007 requested review of this revision.Aug 23 2023, 7:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 7:47 AM

Herald added subscribers: llvm-commits, eopXD, MaskRay. · View Herald Transcript

the optimization data for code size in Oz

the optimization data for code size in Oz(strip the symbol table)

the optimization data for code size in Oz with march rv64imafd(no c extension and strip the symbol table)

the code size of milc regress a little and I find that .text become less but .eh_frame_hdr/.eh_frame become larger.

the code size of newest order algorithm(split the stack into group with same alignment)

Harbormaster completed remote builds in B254345: Diff 552712.Aug 23 2023, 8:39 AM

Can you describe the cost model in the patch description?

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1501	firstly -> first

lcvon007 retitled this revision from [RISCV] Reorder the stack objects. to [RISCV] Reorder the stack frame objects..Aug 23 2023, 7:13 PM

lcvon007 edited the summary of this revision. (Show Details)

Herald added a subscriber: pengfei. · View Herald TranscriptAug 23 2023, 7:13 PM

add decription for cost model and modify the typo.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1501	done

In D158623#4610566, @craig.topper wrote:

Can you describe the cost model in the patch description?

done, thanks a lot

craig.topper added inline comments.Aug 23 2023, 7:48 PM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
530	Don't we need to check C or Zcf/Zcd for compressing FP loads and stores? They aren't compressible with just Zca.
534	Just to confirm, the vector passed to this function does not include the emergency spill slot scavenging slots which must be kept to close to sp/fp?

add a { } for if condition, NFC

Harbormaster completed remote builds in B254519: Diff 552963.Aug 23 2023, 8:34 PM

lcvon007 added inline comments.Aug 23 2023, 8:55 PM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
534	yes, it has excluded it.

add function to check if a ld/st is compressible and remove
the check for c extension because it may improve the code size
even target doesn't support c extension.

lcvon007 added inline comments.Aug 24 2023, 7:45 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
530	I have added function to check whether lw/ld/flw/fld/sw/sd/fsw/fsd is compressible. thanks very much.

Harbormaster completed remote builds in B254629: Diff 553124.Aug 24 2023, 8:56 AM

wangpc added inline comments.Aug 25 2023, 12:01 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
557	Do we really need these small inline functions? What about making them branches (manually inlining)?
585	Do we really need `IsValid`? It's always true I think (same for X86).

wangpc added inline comments.Aug 25 2023, 12:03 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
585	OK, ignore it. `SortingObjects` is with bigger size than `ObjectsToAllocate`.

wangpc added inline comments.Aug 25 2023, 12:37 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
568	It seems we can also reduce stack size? I think we can enable it by default, not only for optsize. Performance should be evaluated.

Inline compressible ld/st check function by hand. NFC

lcvon007 added inline comments.Aug 27 2023, 7:57 PM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
557	done, thanks for you
568	The use number of ld/st is static, and the running result may be different, so it's not sure whether we can improve the performance. I think it cannot always reduce stack size, such as, object1: 4B, 4B aligned object2: 4B, 4B aligned object3: 8B, 8B aligned the stack size will increase if the object order is object1 object3 object2.

Harbormaster completed remote builds in B255150: Diff 553824.Aug 27 2023, 8:46 PM

combine check if lw/sw/ld/sd are comprssible and update testcase removing extra mattr 'c' because
reorder has effect even without C extension. NFC

please help review(reivew opinions have been done), thanks very much. @craig.topper @wangpc

Harbormaster completed remote builds in B255376: Diff 554138.Aug 28 2023, 8:16 PM

wangpc added inline comments.Aug 29 2023, 8:38 PM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
478	Just a style taste. Can we replace this struct with lambda?
501	We don't need `static_cast` I think.
547	`hasMinSize()` means that we only enable this optimization in `-Oz`, not in `-Os`. Is this expected? `hasOptSize()` is for both `-Os` and `-Oz`.
613	This should be tested. Please add a MIR test that uses FP.
615	Please add a `LLVM_DEBUG` to print the final frame just like AArch64.

Use lamda for sort function and add Debug codes for the result of frame reorder.

lcvon007 added inline comments.Aug 30 2023, 7:34 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
613	reorder-inst-compress.mir has tested in the second function.

lcvon007 added inline comments.Aug 30 2023, 7:40 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
478	done, thanks
547	Oz is expected and I use RISCVMakeCompressible.cpp as a reference, and do you know when we need to enable it in Os? is it that decreasing codesize much but regress the performace very less? @wangpc

lcvon007 added inline comments.Aug 30 2023, 7:43 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
501	The difference is that if we need considering the overflow of uint32 x uint32. I think we may keep it.
615	done, thanks

wangpc added inline comments.Aug 31 2023, 1:02 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
547	`Os` is an optimization level for both GCC and LLVM, `Oz` is only for LLVM. For LLVM, `Oz` means extreme code size optimization, and `Os` will consider both code size and performance. As for your patch, I think it can be enabled under `Os` since it seems that performance won't be impacted(?).

kito-cheng added inline comments.Aug 31 2023, 1:46 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
547	GCC has Oz as well since GCC 12 :) https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-Oz

wangpc added inline comments.Aug 31 2023, 2:03 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
547	Oh! Thanks! I didn't know it!

enable opt in Oz and Os.

lcvon007 added inline comments.Aug 31 2023, 5:42 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
547	Enable this opt in Oz/Os now, thanks.

Harbormaster completed remote builds in B255992: Diff 554997.Aug 31 2023, 5:45 AM

LGTM, it seems reasonable to me, but please wait for @craig.topper to see if there are more comments.

This revision is now accepted and ready to land.Aug 31 2023, 6:39 AM

reames requested changes to this revision.Aug 31 2023, 9:57 AM

reames added a subscriber: reames.

reames added inline comments.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
470	The usual structure for this type of thing is to switch over opcode, and put the predicates in the case blocks.
491	I don't think this should be conditional on size optimization. Using compressed loads and stores for frequently accessed objects should improve performance or at least be neutral.
575	I think you're missing something really significant here. If we start with i32, align 4 i8, align 1 And we reorder to: i8, align 1 i32, align 4 There's an extra three bytes of padding added between the objects. This lengthens the total offset computation, and may push the second object out of the base+offset addressable range. With a badly chosen set of objects, you can also end up growing the stack very significantly due to all the extra padding. I would suggest that you start by restricting yourself to reordering objects of the same alignment. (i.e. sort only within sets of object with equal alignment)
585	I think it would be much simpler to remove isValid and have a separate map from frame index to SortedObject index. Please rework this.
1505	This looks like a separate change, and should probably be it's own review with it's own tests. Also, magic constants are bad. Why can't this be written in terms of StackAlign just like the non-compressed case just below?

This revision now requires changes to proceed.Aug 31 2023, 9:57 AM

Use switch to implement isCompressibleLdOrSt
refine sort alorithm from sorting the whole objects into sorting each group with same alignment size.
use sperate map for frame index mapping.

lcvon007 added inline comments.Sep 1 2023, 1:02 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
575	I adjust the algorithm to sort the objects with same alignment size only(although I know it's not the best order). I split the following frame objects into four groups rather than three groups, alignment size of each objects: 1B 4B 4B 1B 4B group0: 1B group1: 4B 4B group2: 1B group3: 4B and it will make sure that no extra padding is added. we may get the smallest frame size if we sort the whole frame objects first, but different groups will have its order first, so it's hard to adjust object in group A before object in group B, or inversely.

lcvon007 added inline comments.Sep 1 2023, 1:05 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1505	remove it, and I will submit another commit for it.

lcvon007 added inline comments.Sep 1 2023, 1:07 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
470	done.
585	Rework yet, please help review. @reames thanks very much.

lcvon007 added inline comments.Sep 1 2023, 1:23 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
491	yes, I think it may improve the performance in much scenario, but it may regress the performance because we use static use number of object but not the runtime information. For example, The original order of frame objects is : A, B (A is in base+ offset addressable range and B is not). The final order after sorting is: B, A( B is in range but A is not) and the running number of A is much more than B, so it may regress the performance in theory. So is it ok to put this optimization under normal compiling mode? @reames

Harbormaster completed remote builds in B256202: Diff 555294.Sep 4 2023, 6:22 PM

rebase main

hi reames, craig, may you help me review my patch again? review opinions have all been fixed, thanks very much @reames @craig.topper

In D158623#4639172, @lcvon007 wrote:

hi reames, craig, may you help me review my patch again? review opinions have all been fixed, thanks very much @reames @craig.topper

May you help reivew? thanks very much @reames @craig.topper

craig.topper added inline comments.Sep 15 2023, 1:58 PM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
467	`return STI.hasStdExtCOrZca() \|\| STI.hasStdExtZce()`
473	`return ((STI.hasStdExtC() \|\| STI.hasStdExtZcf() \|\| (STI.hasStdExtZce())) && (STI.getFLen() == 32)`
474	C.FLW should still be compressible with FLen==64 I think. It requires the F extension, but the D extension doesn't disable it. Or was FLen here supposed to be XLen?
480	Why STI.getFLen() <= 64? Shouldn't it be `== 64` or `>= 64`? Though if you see an FLD or FSD then FLEN must be at least 64 so you can probably ignore that check. `return STI.hasStdExtC() \|\| STI.hasStdExtZcd()`
486	Make all the cases return and drop this return after the switch.

update isCompressibleLdOrSt to fix craig.topper's opinions.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
467	fixed, thanks a a lot.

lcvon007 marked 3 inline comments as done.Sep 18 2023, 1:56 AM

lcvon007 added inline comments.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
473	done.
474	Use XLen to replace FLen, thanks very much.
480	Have used XLen <=64 to replace FLen <=64 and the reason I add this condition is that FLD/FSD can be compressed in RV32DC and RV64DC only, please help review. @craig.topper thanks a lot.
486	done.

Harbormaster completed remote builds in B257334: Diff 556934.Sep 18 2023, 2:33 AM

ping

May you help me review again? and Phabricator will be read-only after October 1, thanks very much. @craig.topper
@reames

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVFrameLowering.h

8 lines

RISCVFrameLowering.cpp

155 lines

test/

CodeGen/

RISCV/

reorder-frame-objects.mir

311 lines

Diff 556934

llvm/lib/Target/RISCV/RISCVFrameLowering.h

Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	public:
TargetStackID::Value getStackIDForScalableVectors() const override;		TargetStackID::Value getStackIDForScalableVectors() const override;

bool isStackIdSafeForLocalArea(unsigned StackId) const override {		bool isStackIdSafeForLocalArea(unsigned StackId) const override {
// We don't support putting RISC-V Vector objects into the pre-allocated		// We don't support putting RISC-V Vector objects into the pre-allocated
// local frame block at the moment.		// local frame block at the moment.
return StackId != TargetStackID::ScalableVector;		return StackId != TargetStackID::ScalableVector;
}		}

		/// Order the symbols in the local stack.
		/// We want to place the local stack objects in some sort of sensible order.
		/// The heuristic we use is to try and pack them according to static number
		/// of uses(hot).
		void
		orderFrameObjects(const MachineFunction &MF,
		SmallVectorImpl<int> &ObjectsToAllocate) const override;

protected:		protected:
const RISCVSubtarget &STI;		const RISCVSubtarget &STI;

private:		private:
void determineFrameLayout(MachineFunction &MF) const;		void determineFrameLayout(MachineFunction &MF) const;
void adjustStackForRVV(MachineFunction &MF, MachineBasicBlock &MBB,		void adjustStackForRVV(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBI, const DebugLoc &DL,		MachineBasicBlock::iterator MBBI, const DebugLoc &DL,
int64_t Amount, MachineInstr::MIFlag Flag) const;		int64_t Amount, MachineInstr::MIFlag Flag) const;
std::pair<int64_t, Align>		std::pair<int64_t, Align>
assignRVVStackObjectOffsets(MachineFunction &MF) const;		assignRVVStackObjectOffsets(MachineFunction &MF) const;
};		};
} // namespace llvm		} // namespace llvm
#endif		#endif

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp

Show All 19 Lines
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/RegisterScavenging.h"		#include "llvm/CodeGen/RegisterScavenging.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/MC/MCDwarf.h"		#include "llvm/MC/MCDwarf.h"
#include "llvm/Support/LEB128.h"		#include "llvm/Support/LEB128.h"

#include <algorithm>		#include <algorithm>

		#define DEBUG_TYPE "frame-info"

using namespace llvm;		using namespace llvm;

static const Register AllPopRegs[] = {		static const Register AllPopRegs[] = {
RISCV::X1, RISCV::X8, RISCV::X9, RISCV::X18, RISCV::X19,		RISCV::X1, RISCV::X8, RISCV::X9, RISCV::X18, RISCV::X19,
RISCV::X20, RISCV::X21, RISCV::X22, RISCV::X23, RISCV::X24,		RISCV::X20, RISCV::X21, RISCV::X22, RISCV::X23, RISCV::X24,
RISCV::X25, RISCV::X26, RISCV::X27};		RISCV::X25, RISCV::X26, RISCV::X27};

// For now we use x3, a.k.a gp, as pointer to shadow call stack.		// For now we use x3, a.k.a gp, as pointer to shadow call stack.
▲ Show 20 Lines • Show All 412 Lines • ▼ Show 20 Lines	static MCCFIInstruction createDefCFAExpression(const TargetRegisterInfo &TRI,
DefCfaExpr.push_back(dwarf::DW_CFA_def_cfa_expression);		DefCfaExpr.push_back(dwarf::DW_CFA_def_cfa_expression);
DefCfaExpr.append(buffer, buffer + encodeULEB128(Expr.size(), buffer));		DefCfaExpr.append(buffer, buffer + encodeULEB128(Expr.size(), buffer));
DefCfaExpr.append(Expr.str());		DefCfaExpr.append(Expr.str());

return MCCFIInstruction::createEscape(nullptr, DefCfaExpr.str(), SMLoc(),		return MCCFIInstruction::createEscape(nullptr, DefCfaExpr.str(), SMLoc(),
Comment.str());		Comment.str());
}		}

		// Return true if MI is a load or store for which there exist a compressed
		// version.
		static bool isCompressibleLdOrSt(const MachineInstr &MI) {
		const RISCVSubtarget &STI = MI.getMF()->getSubtarget<RISCVSubtarget>();
		switch (MI.getOpcode()) {
		case RISCV::LW:
		case RISCV::SW:
		case RISCV::LD:
		case RISCV::SD:
		return STI.hasStdExtCOrZca() \|\| STI.hasStdExtZce();
		craig.topperUnsubmitted Done Reply Inline Actions `return STI.hasStdExtCOrZca() \|\| STI.hasStdExtZce()` craig.topper: `return STI.hasStdExtCOrZca() \|\| STI.hasStdExtZce()`
		lcvon007AuthorUnsubmitted Done Reply Inline Actions fixed, thanks a a lot. lcvon007: fixed, thanks a a lot.
		case RISCV::FLW:
		case RISCV::FSW:
		// C.FLW/C.FSW/C.FLWSP/C.SWSP is only supported by RV32FC
		reamesUnsubmitted Not Done Reply Inline Actions The usual structure for this type of thing is to switch over opcode, and put the predicates in the case blocks. reames: The usual structure for this type of thing is to switch over opcode, and put the predicates in…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions done. lcvon007: done.
		return (STI.hasStdExtC() \|\| STI.hasStdExtZcf() \|\| (STI.hasStdExtZce())) &&
		STI.getXLen() == 32;
		case RISCV::FLD:
		craig.topperUnsubmitted Done Reply Inline Actions `return ((STI.hasStdExtC() \|\| STI.hasStdExtZcf() \|\| (STI.hasStdExtZce())) && (STI.getFLen() == 32)` craig.topper: `return ((STI.hasStdExtC() \|\| STI.hasStdExtZcf() \|\| (STI.hasStdExtZce())) && (STI.getFLen() ==…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions done. lcvon007: done.
		case RISCV::FSD:
		craig.topperUnsubmitted Done Reply Inline Actions C.FLW should still be compressible with FLen==64 I think. It requires the F extension, but the D extension doesn't disable it. Or was FLen here supposed to be XLen? craig.topper: C.FLW should still be compressible with FLen==64 I think. It requires the F extension, but the…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions Use XLen to replace FLen, thanks very much. lcvon007: Use XLen to replace FLen, thanks very much.
		// C.FLD/C.FSD/C.FLDSP/C.FSDSP is only supported by RV32DC and RV64DC
		return (STI.hasStdExtC() \|\| STI.hasStdExtZcd()) && STI.getXLen() <= 64;
		default:
		return false;
		wangpcUnsubmitted Not Done Reply Inline Actions Just a style taste. Can we replace this struct with lambda? wangpc: Just a style taste. Can we replace this struct with lambda?
		lcvon007AuthorUnsubmitted Done Reply Inline Actions done, thanks lcvon007: done, thanks
		}
		}
		craig.topperUnsubmitted Not Done Reply Inline Actions Why STI.getFLen() <= 64? Shouldn't it be `== 64` or `>= 64`? Though if you see an FLD or FSD then FLEN must be at least 64 so you can probably ignore that check. `return STI.hasStdExtC() \|\| STI.hasStdExtZcd()` craig.topper: Why STI.getFLen() <= 64? Shouldn't it be `== 64` or `>= 64`? Though if you see an FLD or FSD…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions Have used XLen <=64 to replace FLen <=64 and the reason I add this condition is that FLD/FSD can be compressed in RV32DC and RV64DC only, please help review. @craig.topper thanks a lot. lcvon007: Have used XLen <=64 to replace FLen <=64 and the reason I add this condition is that FLD/FSD…

		void RISCVFrameLowering::orderFrameObjects(
		const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
		const MachineFrameInfo &MFI = MF.getFrameInfo();
		const RISCVRegisterInfo *RI = STI.getRegisterInfo();
		// It's only used to reduce codesize.
		craig.topperUnsubmitted Done Reply Inline Actions Make all the cases return and drop this return after the switch. craig.topper: Make all the cases return and drop this return after the switch.
		lcvon007AuthorUnsubmitted Done Reply Inline Actions done. lcvon007: done.
		if (!MF.getFunction().hasOptSize())
		return;
		// Don't waste time if there's nothing to do.
		if (ObjectsToAllocate.empty())
		return;
		reamesUnsubmitted Not Done Reply Inline Actions I don't think this should be conditional on size optimization. Using compressed loads and stores for frequently accessed objects should improve performance or at least be neutral. reames: I don't think this should be conditional on size optimization. Using compressed loads and…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions yes, I think it may improve the performance in much scenario, but it may regress the performance because we use static use number of object but not the runtime information. For example, The original order of frame objects is : A, B (A is in base+ offset addressable range and B is not). The final order after sorting is: B, A( B is in range but A is not) and the running number of A is much more than B, so it may regress the performance in theory. So is it ok to put this optimization under normal compiling mode? @reames lcvon007: yes, I think it may improve the performance in much scenario, but it may regress the…

		// Struct that helps sort the stack objects.
		struct RISCVFrameSortingObject {
		unsigned ObjectIndex = 0; // Index of Object in MFI list.
		unsigned ObjectSize = 0; // Size of Object in bytes
		Align ObjectAlignment = Align(1); // Alignment of Object in bytes.
		unsigned ObjectNumUses = 0; // Object static number of uses.
		};

		// Key: index of object in MFI list.
		wangpcUnsubmitted Not Done Reply Inline Actions We don't need `static_cast` I think. wangpc: We don't need `static_cast` I think.
		lcvon007AuthorUnsubmitted Done Reply Inline Actions The difference is that if we need considering the overflow of uint32 x uint32. I think we may keep it. lcvon007: The difference is that if we need considering the overflow of uint32 x uint32. I think we may…
		// Value: index of sorting object in SortingObjects vector.
		DenseMap<int, unsigned> ObjIdxToSortIdx;
		std::vector<RISCVFrameSortingObject> SortingObjects(ObjectsToAllocate.size());

		// Init SortingObjects.
		// The stack address of dynamic objects(size is zero) is only affected by
		// total stack size, so it doesn't need to handle it specially.
		for (const auto &[Idx, Obj] : enumerate(ObjectsToAllocate)) {
		SortingObjects[Idx].ObjectIndex = Obj;
		SortingObjects[Idx].ObjectAlignment = MFI.getObjectAlign(Obj);
		SortingObjects[Idx].ObjectSize = MFI.getObjectSize(Obj);
		// Save index mapping info.
		ObjIdxToSortIdx[Obj] = Idx;
		}

		// Count the number of uses for each object.
		for (auto &MBB : MF) {
		for (auto &MI : MBB) {
		if (MI.isDebugInstr())
		continue;
		for (const MachineOperand &MO : MI.operands()) {
		// Check to see if it's a local stack symbol.
		if (!MO.isFI())
		continue;
		int Index = MO.getIndex();
		// Check to see if it falls within our range.
		if (Index >= 0 && Index < MFI.getObjectIndexEnd()) {
		if (ObjIdxToSortIdx.find(Index) != ObjIdxToSortIdx.end()) {
		if (isCompressibleLdOrSt(MI))
		craig.topperUnsubmitted Not Done Reply Inline Actions Don't we need to check C or Zcf/Zcd for compressing FP loads and stores? They aren't compressible with just Zca. craig.topper: Don't we need to check C or Zcf/Zcd for compressing FP loads and stores? They aren't…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions I have added function to check whether lw/ld/flw/fld/sw/sd/fsw/fsd is compressible. thanks very much. lcvon007: I have added function to check whether lw/ld/flw/fld/sw/sd/fsw/fsd is compressible. thanks very…
		// ld/st is more possible to be compressed so increase its
		// weight and 2 is estimate.
		SortingObjects[ObjIdxToSortIdx[Index]].ObjectNumUses += 2;
		else
		craig.topperUnsubmitted Not Done Reply Inline Actions Just to confirm, the vector passed to this function does not include the emergency spill slot scavenging slots which must be kept to close to sp/fp? craig.topper: Just to confirm, the vector passed to this function does not include the emergency spill slot…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions yes, it has excluded it. lcvon007: yes, it has excluded it. {F28805982}
		SortingObjects[ObjIdxToSortIdx[Index]].ObjectNumUses++;
		}
		}
		}
		}
		}

		bool UseSpAsBase = true;
		// Access offset of the FP.
		if (!RI->hasStackRealignment(MF) && hasFP(MF))
		UseSpAsBase = false;

		// Split SortingObjects into multiple groups that have objects with
		wangpcUnsubmitted Not Done Reply Inline Actions `hasMinSize()` means that we only enable this optimization in `-Oz`, not in `-Os`. Is this expected? `hasOptSize()` is for both `-Os` and `-Oz`. wangpc: `hasMinSize()` means that we only enable this optimization in `-Oz`, not in `-Os`. Is this…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions Oz is expected and I use RISCVMakeCompressible.cpp as a reference, and do you know when we need to enable it in Os? is it that decreasing codesize much but regress the performace very less? @wangpc lcvon007: Oz is expected and I use RISCVMakeCompressible.cpp as a reference, and do you know when we…
		wangpcUnsubmitted Not Done Reply Inline Actions `Os` is an optimization level for both GCC and LLVM, `Oz` is only for LLVM. For LLVM, `Oz` means extreme code size optimization, and `Os` will consider both code size and performance. As for your patch, I think it can be enabled under `Os` since it seems that performance won't be impacted(?). wangpc: `Os` is an optimization level for both GCC and LLVM, `Oz` is only for LLVM. For LLVM, `Oz`…
		kito-chengUnsubmitted Not Done Reply Inline Actions GCC has Oz as well since GCC 12 :) https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-Oz kito-cheng: GCC has Oz as well since GCC 12 :) https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.
		wangpcUnsubmitted Not Done Reply Inline Actions Oh! Thanks! I didn't know it! wangpc: Oh! Thanks! I didn't know it!
		lcvon007AuthorUnsubmitted Done Reply Inline Actions Enable this opt in Oz/Os now, thanks. lcvon007: Enable this opt in Oz/Os now, thanks.
		// same alignment and sort them in each group to avoid increasing
		// extra padding.
		// For example, supposed that each alignment size of objects in
		// SortingObjets is as follows：
		// 1B 1B 4B 1B 4B 4B
		// They're splitted into four groups:
		// group0(1B, 1B)
		// group1(4B)
		// group2(1B)
		// group3(4B, 4B)
		wangpcUnsubmitted Not Done Reply Inline Actions Do we really need these small inline functions? What about making them branches (manually inlining)? wangpc: Do we really need these small inline functions? What about making them branches (manually…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions done, thanks for you lcvon007: done, thanks for you
		for (auto SortBegin = SortingObjects.begin(), SortEnd = SortingObjects.end();
		SortBegin != SortEnd;) {
		auto SortGroupEnd = std::next(SortBegin);
		while (SortGroupEnd != SortingObjects.end() &&
		SortGroupEnd->ObjectAlignment == SortBegin->ObjectAlignment)
		++SortGroupEnd;
		// The current comparison algorithm is to use an estimated
		// "density". This takes into consideration the size and number of
		// uses each object has in order to roughly minimize code size.
		// For example, an object of size 16B that is referenced 5 times
		// will get higher priority than 4B objects referenced 1 time.
		wangpcUnsubmitted Not Done Reply Inline Actions It seems we can also reduce stack size? I think we can enable it by default, not only for optsize. Performance should be evaluated. wangpc: It seems we can also reduce stack size? I think we can enable it by default, not only for…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions The use number of ld/st is static, and the running result may be different, so it's not sure whether we can improve the performance. I think it cannot always reduce stack size, such as, object1: 4B, 4B aligned object2: 4B, 4B aligned object3: 8B, 8B aligned the stack size will increase if the object order is object1 object3 object2. lcvon007: The use number of ld/st is static, and the running result may be different, so it's not sure…
		// The stack symbols with higher piority have shorter offset relative
		// to sp/fp so that stack related instructions about them are more
		// possible to be improved.
		std::stable_sort(SortBegin, SortGroupEnd,
		[&UseSpAsBase](const RISCVFrameSortingObject &A,
		const RISCVFrameSortingObject &B) {
		uint64_t DensityAScaled, DensityBScaled;
		reamesUnsubmitted Not Done Reply Inline Actions I think you're missing something really significant here. If we start with i32, align 4 i8, align 1 And we reorder to: i8, align 1 i32, align 4 There's an extra three bytes of padding added between the objects. This lengthens the total offset computation, and may push the second object out of the base+offset addressable range. With a badly chosen set of objects, you can also end up growing the stack very significantly due to all the extra padding. I would suggest that you start by restricting yourself to reordering objects of the same alignment. (i.e. sort only within sets of object with equal alignment) reames: I think you're missing something really significant here. If we start with ``` i32, align 4…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions I adjust the algorithm to sort the objects with same alignment size only(although I know it's not the best order). I split the following frame objects into four groups rather than three groups, alignment size of each objects: 1B 4B 4B 1B 4B group0: 1B group1: 4B 4B group2: 1B group3: 4B and it will make sure that no extra padding is added. we may get the smallest frame size if we sort the whole frame objects first, but different groups will have its order first, so it's hard to adjust object in group A before object in group B, or inversely. lcvon007: I adjust the algorithm to sort the objects with same alignment size only(although I know it's…

		// The density is calculated by doing :
		// (double)DensityA = A.ObjectNumUses / A.ObjectSize
		// (double)DensityB = B.ObjectNumUses / B.ObjectSize
		// Since this approach may cause inconsistencies in
		// the floating point <, >, == comparisons, depending on
		// the floating point model with which the compiler was
		// built, we're going to scale both sides by multiplying
		// with A.ObjectSize * B.ObjectSize. This ends up
		// factoring away the division and, with it, the need for
		wangpcUnsubmitted Not Done Reply Inline Actions Do we really need `IsValid`? It's always true I think (same for X86). wangpc: Do we really need `IsValid`? It's always true I think (same for X86).
		wangpcUnsubmitted Not Done Reply Inline Actions OK, ignore it. `SortingObjects` is with bigger size than `ObjectsToAllocate`. wangpc: OK, ignore it. `SortingObjects` is with bigger size than `ObjectsToAllocate`.
		reamesUnsubmitted Not Done Reply Inline Actions I think it would be much simpler to remove isValid and have a separate map from frame index to SortedObject index. Please rework this. reames: I think it would be much simpler to remove isValid and have a separate map from frame index to…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions Rework yet, please help review. @reames thanks very much. lcvon007: Rework yet, please help review. @reames thanks very much.
		// any floating point arithmetic.
		DensityAScaled = static_cast<uint64_t>(A.ObjectNumUses) *
		static_cast<uint64_t>(B.ObjectSize);
		DensityBScaled = static_cast<uint64_t>(B.ObjectNumUses) *
		static_cast<uint64_t>(A.ObjectSize);
		// Make sure object with higher density is closer to
		// sp/fp.
		return UseSpAsBase ? DensityAScaled < DensityBScaled
		: DensityAScaled > DensityBScaled;
		});

		SortBegin = SortGroupEnd;
		}
		// Now modify the original list to represent the final order that
		// we want.
		for (const auto &[Idx, Obj] : enumerate(SortingObjects)) {
		ObjectsToAllocate[Idx] = Obj.ObjectIndex;
		}

		LLVM_DEBUG(dbgs() << "Final frame order:\n"; for (auto &Obj
		: ObjectsToAllocate) {
		dbgs() << "Frame object index: " << Obj << "\n";
		});
		}

void RISCVFrameLowering::emitPrologue(MachineFunction &MF,		void RISCVFrameLowering::emitPrologue(MachineFunction &MF,
MachineBasicBlock &MBB) const {		MachineBasicBlock &MBB) const {
MachineFrameInfo &MFI = MF.getFrameInfo();		MachineFrameInfo &MFI = MF.getFrameInfo();
		wangpcUnsubmitted Not Done Reply Inline Actions This should be tested. Please add a MIR test that uses FP. wangpc: This should be tested. Please add a MIR test that uses FP.
		lcvon007AuthorUnsubmitted Done Reply Inline Actions reorder-inst-compress.mir has tested in the second function. lcvon007: reorder-inst-compress.mir has tested in the second function.
auto *RVFI = MF.getInfo<RISCVMachineFunctionInfo>();		auto *RVFI = MF.getInfo<RISCVMachineFunctionInfo>();
const RISCVRegisterInfo *RI = STI.getRegisterInfo();		const RISCVRegisterInfo *RI = STI.getRegisterInfo();
		wangpcUnsubmitted Not Done Reply Inline Actions Please add a `LLVM_DEBUG` to print the final frame just like AArch64. wangpc: Please add a `LLVM_DEBUG` to print the final frame just like AArch64.
		lcvon007AuthorUnsubmitted Done Reply Inline Actions done, thanks lcvon007: done, thanks
const RISCVInstrInfo *TII = STI.getInstrInfo();		const RISCVInstrInfo *TII = STI.getInstrInfo();
MachineBasicBlock::iterator MBBI = MBB.begin();		MachineBasicBlock::iterator MBBI = MBB.begin();

Register FPReg = getFPReg(STI);		Register FPReg = getFPReg(STI);
Register SPReg = getSPReg(STI);		Register SPReg = getSPReg(STI);
Register BPReg = RISCVABI::getBPReg();		Register BPReg = RISCVABI::getBPReg();

// Debug location must be unknown since the first debug location is used		// Debug location must be unknown since the first debug location is used
▲ Show 20 Lines • Show All 869 Lines • ▼ Show 20 Lines	if (STI.hasStdExtCOrZca()) {
StackSize <= 2047 * 2 + CompressLen) \|\|		StackSize <= 2047 * 2 + CompressLen) \|\|
StackSize > 2048 * 3 - StackAlign)		StackSize > 2048 * 3 - StackAlign)
return true;		return true;

return false;		return false;
};		};
// In the epilogue, addi sp, sp, 496 is used to recover the sp and it		// In the epilogue, addi sp, sp, 496 is used to recover the sp and it
// can be compressed(C.ADDI16SP, offset can be [-512, 496]), but		// can be compressed(C.ADDI16SP, offset can be [-512, 496]), but
// addi sp, sp, 512 can not be compressed. So try to use 496 first.		// addi sp, sp, 512 can not be compressed. So try to use 496 first.
		craig.topperUnsubmitted Not Done Reply Inline Actions firstly -> first craig.topper: firstly -> first
		lcvon007AuthorUnsubmitted Done Reply Inline Actions done lcvon007: done
const uint64_t ADDI16SPCompressLen = 496;		const uint64_t ADDI16SPCompressLen = 496;
if (STI.is64Bit() && CanCompress(ADDI16SPCompressLen))		if (STI.is64Bit() && CanCompress(ADDI16SPCompressLen))
return ADDI16SPCompressLen;		return ADDI16SPCompressLen;
if (CanCompress(RVCompressLen))		if (CanCompress(RVCompressLen))
		reamesUnsubmitted Not Done Reply Inline Actions This looks like a separate change, and should probably be it's own review with it's own tests. Also, magic constants are bad. Why can't this be written in terms of StackAlign just like the non-compressed case just below? reames: This looks like a separate change, and should probably be it's own review with it's own tests.
		lcvon007AuthorUnsubmitted Done Reply Inline Actions remove it, and I will submit another commit for it. lcvon007: remove it, and I will submit another commit for it.
return RVCompressLen;		return RVCompressLen;
}		}
return 2048 - StackAlign;		return 2048 - StackAlign;
}		}
return 0;		return 0;
}		}

bool RISCVFrameLowering::spillCalleeSavedRegisters(		bool RISCVFrameLowering::spillCalleeSavedRegisters(
▲ Show 20 Lines • Show All 188 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/reorder-frame-objects.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2
				# RUN: llc -march=riscv64 -x mir -run-pass=prologepilog -stack-symbol-ordering=0 \
				# RUN: -verify-machineinstrs < %s \| FileCheck -check-prefixes=CHECK-RV64-NO-REORDER %s
				# RUN: llc -march=riscv64 -x mir -run-pass=prologepilog \
				# RUN: -verify-machineinstrs < %s \| FileCheck -check-prefixes=CHECK-RV64-REORDER %s
				--- \|

				define dso_local void @_Z12stack_use_spv() local_unnamed_addr #0 {
				entry:
				ret void
				}

				declare dso_local void @_Z7callee0Pi(ptr noundef) local_unnamed_addr #0

				declare dso_local void @_Z7callee1Pc(ptr noundef) local_unnamed_addr #0

				define dso_local void @_Z12stack_use_fpjj(i32 noundef signext %m, i32 noundef signext %n) local_unnamed_addr #0 {
				entry:
				ret void
				}

				attributes #0 = { minsize optsize }

				...
				---
				name: _Z12stack_use_spv
				alignment: 2
				tracksRegLiveness: true
				tracksDebugUserValues: true
				frameInfo:
				maxAlignment: 4
				hasCalls: true
				localFrameSize: 2072
				stack:
				- { id: 0, size: 4, alignment: 4, local-offset: -4 }
				- { id: 1, size: 1, alignment: 1, local-offset: -5 }
				- { id: 2, size: 16, alignment: 4, local-offset: -24 }
				- { id: 3, size: 2048, alignment: 4, local-offset: -2072 }
				machineFunctionInfo:
				varArgsFrameIndex: 0
				varArgsSaveSize: 0
				body: \|
				bb.0.entry:
				; CHECK-RV64-NO-REORDER-LABEL: name: _Z12stack_use_spv
				; CHECK-RV64-NO-REORDER: liveins: $x1
				; CHECK-RV64-NO-REORDER-NEXT: {{ $}}
				; CHECK-RV64-NO-REORDER-NEXT: $x2 = frame-setup ADDI $x2, -2032
				; CHECK-RV64-NO-REORDER-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 2032
				; CHECK-RV64-NO-REORDER-NEXT: SD killed $x1, $x2, 2024 :: (store (s64) into %stack.4)
				; CHECK-RV64-NO-REORDER-NEXT: frame-setup CFI_INSTRUCTION offset $x1, -8
				; CHECK-RV64-NO-REORDER-NEXT: $x2 = frame-setup ADDI $x2, -64
				; CHECK-RV64-NO-REORDER-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 2096
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI $x2, 2047
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI killed $x10, 37
				; CHECK-RV64-NO-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI $x2, 2047
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI killed $x10, 36
				; CHECK-RV64-NO-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee1Pc, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI $x2, 2047
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI killed $x10, 17
				; CHECK-RV64-NO-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI $x2, 2047
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI killed $x10, 17
				; CHECK-RV64-NO-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI $x2, 2047
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI killed $x10, 17
				; CHECK-RV64-NO-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI $x2, 2047
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI killed $x10, 17
				; CHECK-RV64-NO-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI $x2, 2047
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI killed $x10, 17
				; CHECK-RV64-NO-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI $x2, 16
				; CHECK-RV64-NO-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				; CHECK-RV64-NO-REORDER-NEXT: $x2 = frame-destroy ADDI $x2, 64
				; CHECK-RV64-NO-REORDER-NEXT: $x1 = LD $x2, 2024 :: (load (s64) from %stack.4)
				; CHECK-RV64-NO-REORDER-NEXT: $x2 = frame-destroy ADDI $x2, 2032
				; CHECK-RV64-NO-REORDER-NEXT: PseudoRET
				;
				; CHECK-RV64-REORDER-LABEL: name: _Z12stack_use_spv
				; CHECK-RV64-REORDER: liveins: $x1
				; CHECK-RV64-REORDER-NEXT: {{ $}}
				; CHECK-RV64-REORDER-NEXT: $x2 = frame-setup ADDI $x2, -2032
				; CHECK-RV64-REORDER-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 2032
				; CHECK-RV64-REORDER-NEXT: SD killed $x1, $x2, 2024 :: (store (s64) into %stack.4)
				; CHECK-RV64-REORDER-NEXT: frame-setup CFI_INSTRUCTION offset $x1, -8
				; CHECK-RV64-REORDER-NEXT: $x2 = frame-setup ADDI $x2, -64
				; CHECK-RV64-REORDER-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 2096
				; CHECK-RV64-REORDER-NEXT: $x10 = ADDI $x2, 2047
				; CHECK-RV64-REORDER-NEXT: $x10 = ADDI killed $x10, 37
				; CHECK-RV64-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				; CHECK-RV64-REORDER-NEXT: $x10 = ADDI $x2, 2047
				; CHECK-RV64-REORDER-NEXT: $x10 = ADDI killed $x10, 36
				; CHECK-RV64-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee1Pc, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				; CHECK-RV64-REORDER-NEXT: $x10 = ADDI $x2, 16
				; CHECK-RV64-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				; CHECK-RV64-REORDER-NEXT: $x10 = ADDI $x2, 16
				; CHECK-RV64-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				; CHECK-RV64-REORDER-NEXT: $x10 = ADDI $x2, 16
				; CHECK-RV64-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				; CHECK-RV64-REORDER-NEXT: $x10 = ADDI $x2, 16
				; CHECK-RV64-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				; CHECK-RV64-REORDER-NEXT: $x10 = ADDI $x2, 16
				; CHECK-RV64-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				; CHECK-RV64-REORDER-NEXT: $x10 = ADDI $x2, 32
				; CHECK-RV64-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				; CHECK-RV64-REORDER-NEXT: $x2 = frame-destroy ADDI $x2, 64
				; CHECK-RV64-REORDER-NEXT: $x1 = LD $x2, 2024 :: (load (s64) from %stack.4)
				; CHECK-RV64-REORDER-NEXT: $x2 = frame-destroy ADDI $x2, 2032
				; CHECK-RV64-REORDER-NEXT: PseudoRET
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				$x10 = ADDI %stack.0, 0
				PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				$x10 = ADDI %stack.1, 0
				PseudoCALL target-flags(riscv-call) @_Z7callee1Pc, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				$x10 = ADDI %stack.2, 0
				PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				$x10 = ADDI %stack.2, 0
				PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				$x10 = ADDI %stack.2, 0
				PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				$x10 = ADDI %stack.2, 0
				PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				$x10 = ADDI %stack.2, 0
				PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				$x10 = ADDI %stack.3, 0
				PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				PseudoRET

				...
				---
				name: _Z12stack_use_fpjj
				alignment: 2
				tracksRegLiveness: true
				tracksDebugUserValues: true
				liveins:
				- { reg: '$x10' }
				- { reg: '$x11' }
				frameInfo:
				maxAlignment: 4
				hasCalls: true
				localFrameSize: 2068
				stack:
				- { id: 0, size: 2064, alignment: 4, local-offset: -2064 }
				- { id: 1, size: 4, alignment: 4, local-offset: -2068 }
				- { id: 2, type: variable-sized, alignment: 1, local-offset: -2068 }
				- { id: 3, type: variable-sized, alignment: 1, local-offset: -2068 }
				machineFunctionInfo:
				varArgsFrameIndex: 0
				varArgsSaveSize: 0
				body: \|
				bb.0.entry:
				liveins: $x10, $x11

				; CHECK-RV64-NO-REORDER-LABEL: name: _Z12stack_use_fpjj
				; CHECK-RV64-NO-REORDER: liveins: $x10, $x11, $x1, $x9, $x18, $x19
				; CHECK-RV64-NO-REORDER-NEXT: {{ $}}
				; CHECK-RV64-NO-REORDER-NEXT: $x2 = frame-setup ADDI $x2, -2032
				; CHECK-RV64-NO-REORDER-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 2032
				; CHECK-RV64-NO-REORDER-NEXT: SD killed $x1, $x2, 2024 :: (store (s64) into %stack.4)
				; CHECK-RV64-NO-REORDER-NEXT: SD killed $x8, $x2, 2016 :: (store (s64) into %stack.5)
				; CHECK-RV64-NO-REORDER-NEXT: SD killed $x9, $x2, 2008 :: (store (s64) into %stack.6)
				; CHECK-RV64-NO-REORDER-NEXT: SD killed $x18, $x2, 2000 :: (store (s64) into %stack.7)
				; CHECK-RV64-NO-REORDER-NEXT: SD killed $x19, $x2, 1992 :: (store (s64) into %stack.8)
				; CHECK-RV64-NO-REORDER-NEXT: frame-setup CFI_INSTRUCTION offset $x1, -8
				; CHECK-RV64-NO-REORDER-NEXT: frame-setup CFI_INSTRUCTION offset $x8, -16
				; CHECK-RV64-NO-REORDER-NEXT: frame-setup CFI_INSTRUCTION offset $x9, -24
				; CHECK-RV64-NO-REORDER-NEXT: frame-setup CFI_INSTRUCTION offset $x18, -32
				; CHECK-RV64-NO-REORDER-NEXT: frame-setup CFI_INSTRUCTION offset $x19, -40
				; CHECK-RV64-NO-REORDER-NEXT: $x8 = frame-setup ADDI $x2, 2032
				; CHECK-RV64-NO-REORDER-NEXT: frame-setup CFI_INSTRUCTION def_cfa $x8, 0
				; CHECK-RV64-NO-REORDER-NEXT: $x2 = frame-setup ADDI $x2, -96
				; CHECK-RV64-NO-REORDER-NEXT: renamable $x19 = COPY $x2
				; CHECK-RV64-NO-REORDER-NEXT: renamable $x10 = SLLI killed renamable $x10, 32
				; CHECK-RV64-NO-REORDER-NEXT: renamable $x10 = SRLI killed renamable $x10, 30
				; CHECK-RV64-NO-REORDER-NEXT: renamable $x10 = nuw ADDI killed renamable $x10, 15
				; CHECK-RV64-NO-REORDER-NEXT: renamable $x10 = ANDI killed renamable $x10, -16
				; CHECK-RV64-NO-REORDER-NEXT: renamable $x18 = SUB $x2, killed renamable $x10
				; CHECK-RV64-NO-REORDER-NEXT: $x2 = COPY renamable $x18
				; CHECK-RV64-NO-REORDER-NEXT: renamable $x11 = SLLI killed renamable $x11, 32
				; CHECK-RV64-NO-REORDER-NEXT: renamable $x11 = SRLI killed renamable $x11, 30
				; CHECK-RV64-NO-REORDER-NEXT: renamable $x11 = nuw ADDI killed renamable $x11, 15
				; CHECK-RV64-NO-REORDER-NEXT: renamable $x11 = ANDI killed renamable $x11, -16
				; CHECK-RV64-NO-REORDER-NEXT: renamable $x9 = SUB $x2, killed renamable $x11
				; CHECK-RV64-NO-REORDER-NEXT: $x2 = COPY renamable $x9
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI $x8, -2048
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI killed $x10, -64
				; CHECK-RV64-NO-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit $x10, implicit-def $x2
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = COPY killed renamable $x18
				; CHECK-RV64-NO-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit $x10, implicit-def $x2
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = COPY killed renamable $x9
				; CHECK-RV64-NO-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit $x10, implicit-def $x2
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI $x8, -2048
				; CHECK-RV64-NO-REORDER-NEXT: $x10 = ADDI killed $x10, -68
				; CHECK-RV64-NO-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit $x10, implicit-def $x2
				; CHECK-RV64-NO-REORDER-NEXT: $x2 = COPY killed renamable $x19
				; CHECK-RV64-NO-REORDER-NEXT: $x2 = frame-destroy ADDI $x8, -2048
				; CHECK-RV64-NO-REORDER-NEXT: $x2 = frame-destroy ADDI killed $x2, -80
				; CHECK-RV64-NO-REORDER-NEXT: $x2 = frame-destroy ADDI $x2, 96
				; CHECK-RV64-NO-REORDER-NEXT: $x1 = LD $x2, 2024 :: (load (s64) from %stack.4)
				; CHECK-RV64-NO-REORDER-NEXT: $x8 = LD $x2, 2016 :: (load (s64) from %stack.5)
				; CHECK-RV64-NO-REORDER-NEXT: $x9 = LD $x2, 2008 :: (load (s64) from %stack.6)
				; CHECK-RV64-NO-REORDER-NEXT: $x18 = LD $x2, 2000 :: (load (s64) from %stack.7)
				; CHECK-RV64-NO-REORDER-NEXT: $x19 = LD $x2, 1992 :: (load (s64) from %stack.8)
				; CHECK-RV64-NO-REORDER-NEXT: $x2 = frame-destroy ADDI $x2, 2032
				; CHECK-RV64-NO-REORDER-NEXT: PseudoRET
				;
				; CHECK-RV64-REORDER-LABEL: name: _Z12stack_use_fpjj
				; CHECK-RV64-REORDER: liveins: $x10, $x11, $x1, $x9, $x18, $x19
				; CHECK-RV64-REORDER-NEXT: {{ $}}
				; CHECK-RV64-REORDER-NEXT: $x2 = frame-setup ADDI $x2, -2032
				; CHECK-RV64-REORDER-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 2032
				; CHECK-RV64-REORDER-NEXT: SD killed $x1, $x2, 2024 :: (store (s64) into %stack.4)
				; CHECK-RV64-REORDER-NEXT: SD killed $x8, $x2, 2016 :: (store (s64) into %stack.5)
				; CHECK-RV64-REORDER-NEXT: SD killed $x9, $x2, 2008 :: (store (s64) into %stack.6)
				; CHECK-RV64-REORDER-NEXT: SD killed $x18, $x2, 2000 :: (store (s64) into %stack.7)
				; CHECK-RV64-REORDER-NEXT: SD killed $x19, $x2, 1992 :: (store (s64) into %stack.8)
				; CHECK-RV64-REORDER-NEXT: frame-setup CFI_INSTRUCTION offset $x1, -8
				; CHECK-RV64-REORDER-NEXT: frame-setup CFI_INSTRUCTION offset $x8, -16
				; CHECK-RV64-REORDER-NEXT: frame-setup CFI_INSTRUCTION offset $x9, -24
				; CHECK-RV64-REORDER-NEXT: frame-setup CFI_INSTRUCTION offset $x18, -32
				; CHECK-RV64-REORDER-NEXT: frame-setup CFI_INSTRUCTION offset $x19, -40
				; CHECK-RV64-REORDER-NEXT: $x8 = frame-setup ADDI $x2, 2032
				; CHECK-RV64-REORDER-NEXT: frame-setup CFI_INSTRUCTION def_cfa $x8, 0
				; CHECK-RV64-REORDER-NEXT: $x2 = frame-setup ADDI $x2, -96
				; CHECK-RV64-REORDER-NEXT: renamable $x19 = COPY $x2
				; CHECK-RV64-REORDER-NEXT: renamable $x10 = SLLI killed renamable $x10, 32
				; CHECK-RV64-REORDER-NEXT: renamable $x10 = SRLI killed renamable $x10, 30
				; CHECK-RV64-REORDER-NEXT: renamable $x10 = nuw ADDI killed renamable $x10, 15
				; CHECK-RV64-REORDER-NEXT: renamable $x10 = ANDI killed renamable $x10, -16
				; CHECK-RV64-REORDER-NEXT: renamable $x18 = SUB $x2, killed renamable $x10
				; CHECK-RV64-REORDER-NEXT: $x2 = COPY renamable $x18
				; CHECK-RV64-REORDER-NEXT: renamable $x11 = SLLI killed renamable $x11, 32
				; CHECK-RV64-REORDER-NEXT: renamable $x11 = SRLI killed renamable $x11, 30
				; CHECK-RV64-REORDER-NEXT: renamable $x11 = nuw ADDI killed renamable $x11, 15
				; CHECK-RV64-REORDER-NEXT: renamable $x11 = ANDI killed renamable $x11, -16
				; CHECK-RV64-REORDER-NEXT: renamable $x9 = SUB $x2, killed renamable $x11
				; CHECK-RV64-REORDER-NEXT: $x2 = COPY renamable $x9
				; CHECK-RV64-REORDER-NEXT: $x10 = ADDI $x8, -2048
				; CHECK-RV64-REORDER-NEXT: $x10 = ADDI killed $x10, -68
				; CHECK-RV64-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit $x10, implicit-def $x2
				; CHECK-RV64-REORDER-NEXT: $x10 = COPY killed renamable $x18
				; CHECK-RV64-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit $x10, implicit-def $x2
				; CHECK-RV64-REORDER-NEXT: $x10 = COPY killed renamable $x9
				; CHECK-RV64-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit $x10, implicit-def $x2
				; CHECK-RV64-REORDER-NEXT: $x10 = ADDI $x8, -52
				; CHECK-RV64-REORDER-NEXT: PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit $x10, implicit-def $x2
				; CHECK-RV64-REORDER-NEXT: $x2 = COPY killed renamable $x19
				; CHECK-RV64-REORDER-NEXT: $x2 = frame-destroy ADDI $x8, -2048
				; CHECK-RV64-REORDER-NEXT: $x2 = frame-destroy ADDI killed $x2, -80
				; CHECK-RV64-REORDER-NEXT: $x2 = frame-destroy ADDI $x2, 96
				; CHECK-RV64-REORDER-NEXT: $x1 = LD $x2, 2024 :: (load (s64) from %stack.4)
				; CHECK-RV64-REORDER-NEXT: $x8 = LD $x2, 2016 :: (load (s64) from %stack.5)
				; CHECK-RV64-REORDER-NEXT: $x9 = LD $x2, 2008 :: (load (s64) from %stack.6)
				; CHECK-RV64-REORDER-NEXT: $x18 = LD $x2, 2000 :: (load (s64) from %stack.7)
				; CHECK-RV64-REORDER-NEXT: $x19 = LD $x2, 1992 :: (load (s64) from %stack.8)
				; CHECK-RV64-REORDER-NEXT: $x2 = frame-destroy ADDI $x2, 2032
				; CHECK-RV64-REORDER-NEXT: PseudoRET
				renamable $x19 = COPY $x2
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				renamable $x10 = SLLI killed renamable $x10, 32
				renamable $x10 = SRLI killed renamable $x10, 30
				renamable $x10 = nuw ADDI killed renamable $x10, 15
				renamable $x10 = ANDI killed renamable $x10, -16
				renamable $x18 = SUB $x2, killed renamable $x10
				$x2 = COPY renamable $x18
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				renamable $x11 = SLLI killed renamable $x11, 32
				renamable $x11 = SRLI killed renamable $x11, 30
				renamable $x11 = nuw ADDI killed renamable $x11, 15
				renamable $x11 = ANDI killed renamable $x11, -16
				renamable $x9 = SUB $x2, killed renamable $x11
				$x2 = COPY renamable $x9
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				$x10 = ADDI %stack.0, 0
				PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit $x10, implicit-def $x2
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				$x10 = COPY killed renamable $x18
				PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit $x10, implicit-def $x2
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				$x10 = COPY killed renamable $x9
				PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit $x10, implicit-def $x2
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				$x10 = ADDI %stack.1, 0
				PseudoCALL target-flags(riscv-call) @_Z7callee0Pi, csr_ilp32_lp64, implicit-def dead $x1, implicit $x10, implicit-def $x2
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				$x2 = COPY killed renamable $x19
				PseudoRET

				...