This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Set preferred function alignment
ClosedPublic

Authored by NickGuy on Aug 9 2023, 8:08 AM.

Download Raw Diff

Details

Reviewers

dmgreen
samtebbs

Commits

rGd65feccb1262: [ARM] Set preferred function alignment

Summary

Aligning functions yields small performance gains on embedded cores, moreso with numerous small function calls. Similar to aligning loops, if the function can fit within a single cache line then the performance overhead of fetching more instructions can be limited.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

NickGuy created this revision.Aug 9 2023, 8:08 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2023, 8:08 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

NickGuy requested review of this revision.Aug 9 2023, 8:08 AM

Can you add a test? Thanks.

Harbormaster completed remote builds in B251393: Diff 548629.Aug 9 2023, 10:24 AM

In D157514#4573318, @dmgreen wrote:

Can you add a test? Thanks.

Done and precommitted.

Harbormaster completed remote builds in B251616: Diff 548931.Aug 10 2023, 2:08 AM

Can you improve the summary to explain why this is being done? Its the same reasons as we align loops.

Should this be done for all cpus? I can see how that would make sense, but as far as I understand you are only really aiming for M-class devices. And we haven't in the past aligned loops for v6m devices (or some of the higher end v7m devices).

llvm/test/CodeGen/ARM/preferred-function-alignment.ll
1	It might be better to make this an Arm CPU deliberately (as opposed to thumb), as opposed to generic. I believe that is what this is testing.

I've assigned the function alignment to the same as the loop alignment, as in my testing I'd seen that the values are "best" when they are equal.

Can you improve the summary to explain why this is being done? Its the same reasons as we align loops.

Words seem to be failing me today. Hopefully the new summary makes sense.

Harbormaster completed remote builds in B252939: Diff 550727.Aug 16 2023, 6:55 AM

I agree it makes sense to use the same alignments, especially for cortex-m cpus. Can you update the LoopAlignment. Maybe call it "CodeAlignment" now? I'm not sure that's better or not to change the name. The documentation can be changed to: "/// What alignment is preferred for loop bodies <and functions>, in log2(bytes)."

Otherwise LGTM. Thanks

This revision is now accepted and ready to land.Aug 16 2023, 7:47 AM

This revision was landed with ongoing or failed builds.Aug 16 2023, 9:33 AM

Closed by commit rGd65feccb1262: [ARM] Set preferred function alignment (authored by NickGuy). · Explain Why

This revision was automatically updated to reflect the committed changes.

NickGuy added a commit: rGd65feccb1262: [ARM] Set preferred function alignment.

Revision Contents

Path

Size

llvm/

lib/

Target/

ARM/

ARMISelLowering.cpp

1 line

ARMSubtarget.h

2 lines

test/

CodeGen/

ARM/

preferred-function-alignment.ll

23 lines

Diff 550789

llvm/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 1,606 Lines • ▼ Show 20 Lines
	// On ARM arguments smaller than 4 bytes are extended, so all arguments			// On ARM arguments smaller than 4 bytes are extended, so all arguments
	// are at least 4 bytes aligned.			// are at least 4 bytes aligned.
	setMinStackArgumentAlignment(Align(4));			setMinStackArgumentAlignment(Align(4));

	// Prefer likely predicted branches to selects on out-of-order cores.			// Prefer likely predicted branches to selects on out-of-order cores.
	PredictableSelectIsExpensive = Subtarget->getSchedModel().isOutOfOrder();			PredictableSelectIsExpensive = Subtarget->getSchedModel().isOutOfOrder();

	setPrefLoopAlignment(Align(1ULL << Subtarget->getPrefLoopLogAlignment()));			setPrefLoopAlignment(Align(1ULL << Subtarget->getPrefLoopLogAlignment()));
				setPrefFunctionAlignment(Align(1ULL << Subtarget->getPrefLoopLogAlignment()));

	setMinFunctionAlignment(Subtarget->isThumb() ? Align(2) : Align(4));			setMinFunctionAlignment(Subtarget->isThumb() ? Align(2) : Align(4));

	if (Subtarget->isThumb() \|\| Subtarget->isThumb2())			if (Subtarget->isThumb() \|\| Subtarget->isThumb2())
	setTargetDAGCombine(ISD::ABS);			setTargetDAGCombine(ISD::ABS);
	}			}

	bool ARMTargetLowering::useSoftFloat() const {			bool ARMTargetLowering::useSoftFloat() const {
	▲ Show 20 Lines • Show All 20,508 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMSubtarget.h

Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	#include "ARMGenSubtargetInfo.inc"
/// What kind of timing do load multiple/store multiple have (double issue,		/// What kind of timing do load multiple/store multiple have (double issue,
/// single issue etc).		/// single issue etc).
ARMLdStMultipleTiming LdStMultipleTiming = SingleIssue;		ARMLdStMultipleTiming LdStMultipleTiming = SingleIssue;

/// The adjustment that we need to apply to get the operand latency from the		/// The adjustment that we need to apply to get the operand latency from the
/// operand cycle returned by the itinerary data for pre-ISel operands.		/// operand cycle returned by the itinerary data for pre-ISel operands.
int PreISelOperandLatencyAdjustment = 2;		int PreISelOperandLatencyAdjustment = 2;

/// What alignment is preferred for loop bodies, in log2(bytes).		/// What alignment is preferred for loop bodies and functions, in log2(bytes).
unsigned PrefLoopLogAlignment = 0;		unsigned PrefLoopLogAlignment = 0;

/// The cost factor for MVE instructions, representing the multiple beats an		/// The cost factor for MVE instructions, representing the multiple beats an
// instruction can take. The default is 2, (set in initSubtargetFeatures so		// instruction can take. The default is 2, (set in initSubtargetFeatures so
// that we can use subtarget features less than 2).		// that we can use subtarget features less than 2).
unsigned MVEVectorCostFactor = 0;		unsigned MVEVectorCostFactor = 0;

/// OptMinSize - True if we're optimising for minimum code size, equal to		/// OptMinSize - True if we're optimising for minimum code size, equal to
▲ Show 20 Lines • Show All 352 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/preferred-function-alignment.ll

This file was added.

				; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m85 < %s \| FileCheck --check-prefixes=CHECK,ALIGN-16,ALIGN-CS-16 %s
				dmgreenUnsubmitted Not Done Reply Inline Actions It might be better to make this an Arm CPU deliberately (as opposed to thumb), as opposed to generic. I believe that is what this is testing. dmgreen: It might be better to make this an Arm CPU deliberately (as opposed to thumb), as opposed to…
				; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m23 < %s \| FileCheck --check-prefixes=CHECK,ALIGN-16,ALIGN-CS-16 %s

				; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-a5 < %s \| FileCheck --check-prefixes=CHECK,ALIGN-32,ALIGN-CS-32 %s
				; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m33 < %s \| FileCheck --check-prefixes=CHECK,ALIGN-32,ALIGN-CS-16 %s
				; RUN: llc -mtriple=arm-none-eabi -mcpu=cortex-m55 < %s \| FileCheck --check-prefixes=CHECK,ALIGN-32,ALIGN-CS-16 %s


				; CHECK-LABEL: test
				; ALIGN-16: .p2align 1
				; ALIGN-32: .p2align 2

				define void @test() {
				ret void
				}

				; CHECK-LABEL: test_optsize
				; ALIGN-CS-16: .p2align 1
				; ALIGN-CS-32: .p2align 2

				define void @test_optsize() optsize {
				ret void
				}