This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
-
LangRef.rst
-
lib/Target/AMDGPU/
-
Target/
-
AMDGPU/
-
AMDGPUTargetTransformInfo.cpp
-
test/Transforms/LoopUnroll/AMDGPU/
-
Transforms/
-
LoopUnroll/
-
AMDGPU/
-
unroll-threshold.ll

Differential D88215

Add llvm.loop.unroll.threshold metadata
AbandonedPublic

Authored by timcorringham on Sep 24 2020, 4:17 AM.

Download Raw Diff

Details

Reviewers

kuhar
arsenm
Meinersbur

Summary

Add new loop metadata llvm.loop.unroll.threshold to allow the initial target
specific unroll threshold value to be specified on a loop by loop basis.

The intention is to be able to to allow more nuanced hints, e.g. specifying a
low threshold value to indicate that a loop may be unrolled if cheap enough
rather than using the all or nothing llvm.loop.unroll.disable metadata.

This new metadata is used in setting the default threshold in the target
specific loop options, so only has any effect for targets that make use of
it - in this change only AMDGPU uses this metadata.

This change was originally proposed as an AMDGPU specific change, but
following review comments (see https://reviews.llvm.org/D84779) is now
being presented as generic metadata.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

timcorringham created this revision.Sep 24 2020, 4:17 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptSep 24 2020, 4:17 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, kerbowa, zzheng and 5 others. · View Herald Transcript

timcorringham requested review of this revision.Sep 24 2020, 4:17 AM

timcorringham edited the summary of this revision. (Show Details)Sep 24 2020, 4:21 AM

Harbormaster completed remote builds in B72796: Diff 294012.Sep 24 2020, 4:50 AM

timcorringham added a reviewer: kuhar.Oct 9 2020, 6:04 AM

jdoerfert edited reviewers, added: arsenm, Meinersbur; removed: jdoerfert.Oct 9 2020, 9:15 AM

Herald added subscribers: jdoerfert, wdng. · View Herald TranscriptOct 9 2020, 9:15 AM

As implemented, the threshold is specific to the AMDGPU backed. Any possibility to implement it generally in LoopUnrollPass.cpp? If it is indeed intended to just by processed by a single backend, the attribute's name should reflect that, like for the amdgpu-unroll-threshold attribute.

The patch is identical to D84779? The same concerns to the undefined semantics for "threshold" still applies.

I guess the loop unroll threshold metadata could be consumed directly by the LoopUnrollPass.cpp, but that would cut out some functionality that is useful (at least for AMDGPU) - the target specific unroll options code for AMDGPU has heuristics that can adjust the unroll threshold. The metadata gives the starting point, but the value may be increased in he presence of certain constructs. If the metadata was used as the final threshold in LoopUnrollPass.cpp then to get the same behaviour the front-end would have to perform the same analysis to adjust the threshold value - which would have an impact on compile time, and duplicate code, etc.

As for whether the metadata should be llvm generic or amdgpu specific, I guess the question is whether any other targets would find this useful? My original approach was to make it amdgpu specific, but reviewers considered that it could be generic - hence this second review.

I'm not sure I'd describe the threshold semantics as undefined - I would describe it as imprecise. If a change has the effect of pushing code over a threshold (in either direction), the effect may be undesirable or unexpected, but equally it can mitigate the impact of changes that would otherwise be undesirable. In any case, the threshold approach is used in the loop unroll pass, so using any other control (such as a precise loop unroll count) would require significantly more effort (the unroll count would have to be calculated in advance, which may still use a threshold).

Unlike the reviewers in D84779, I would be OK with it being target specific. We already have precedence for this in form of the amdgpu-unroll-threshold function attribute and this just applied the same information more specifically to loops. I don't see the issue of non-generality being raised in D68873.

I'd object to an llvm.loop.unroll.threshold metadata without even the intend to define/implement it to other targets as well. I think the threshold number itself is inherently target-specific, so I think it is difficult to define "threshold of 10000" to be comparable between AMD, X86, PowerPC, etc.

Sorry you're caught between reviewer's opinions.

OK, thanks for the comments. I'll abandon the generic form of the metadata and have a re-think...

timcorringham mentioned this in D84779: [AMDGPU] Add amdgpu specific loop threshold metadata.Oct 21 2020, 8:02 AM

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

13 lines

lib/

Target/

AMDGPU/

AMDGPUTargetTransformInfo.cpp

20 lines

test/

Transforms/

LoopUnroll/

AMDGPU/

unroll-threshold.ll

113 lines

Diff 294012

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 5,972 Lines • ▼ Show 20 Lines

	'``llvm.loop.unroll.followup_remainder``' Metadata			'``llvm.loop.unroll.followup_remainder``' Metadata
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	This metadata defines which loop attributes the remainder loop after			This metadata defines which loop attributes the remainder loop after
	partial/runtime unrolling will have. See			partial/runtime unrolling will have. See
	:ref:`Transformation Metadata <transformation-metadata>` for details.			:ref:`Transformation Metadata <transformation-metadata>` for details.

				'``llvm.loop.unroll.threshold``' Metadata
				^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

				This metadata suggests a default unroll threshold value for the loop. This
				metadata is used when setting the target specific unroll options rather than
				within the core unroll pass, so only has effect for targets that make use of it.
				This metadata has a single integer operand specifying the threshold value, for
				example:

				.. code-block:: llvm

				!0 = !{!"llvm.loop.unroll.threshold", i32 250}

	'``llvm.loop.unroll_and_jam``'			'``llvm.loop.unroll_and_jam``'
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	This metadata is treated very similarly to the ``llvm.loop.unroll`` metadata			This metadata is treated very similarly to the ``llvm.loop.unroll`` metadata
	above, but affect the unroll and jam pass. In addition any loop with			above, but affect the unroll and jam pass. In addition any loop with
	``llvm.loop.unroll`` metadata but no ``llvm.loop.unroll_and_jam`` metadata will			``llvm.loop.unroll`` metadata but no ``llvm.loop.unroll_and_jam`` metadata will
	disable unroll and jam (so ``llvm.loop.unroll`` metadata will be left to the			disable unroll and jam (so ``llvm.loop.unroll`` metadata will be left to the
	unroller, plus ``llvm.loop.unroll.disable`` metadata will disable unroll and jam			unroller, plus ``llvm.loop.unroll.disable`` metadata will disable unroll and jam
	▲ Show 20 Lines • Show All 14,793 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	void AMDGPUTTIImpl::getUnrollingPreferences(Loop *L, ScalarEvolution &SE,
UP.Partial = true;		UP.Partial = true;

// TODO: Do we want runtime unrolling?		// TODO: Do we want runtime unrolling?

// Maximum alloca size than can fit registers. Reserve 16 registers.		// Maximum alloca size than can fit registers. Reserve 16 registers.
const unsigned MaxAlloca = (256 - 16) * 4;		const unsigned MaxAlloca = (256 - 16) * 4;
unsigned ThresholdPrivate = UnrollThresholdPrivate;		unsigned ThresholdPrivate = UnrollThresholdPrivate;
unsigned ThresholdLocal = UnrollThresholdLocal;		unsigned ThresholdLocal = UnrollThresholdLocal;

		// If this loop has the llvm.loop.unroll.threshold metadata we will use the
		// provided threshold value as the default for Threshold
		if (MDNode *LoopUnrollThreshold =
		findOptionMDForLoop(L, "llvm.loop.unroll.threshold")) {
		if (LoopUnrollThreshold->getNumOperands() == 2) {
		ConstantInt *MetaThresholdValue = mdconst::extract_or_null<ConstantInt>(
		LoopUnrollThreshold->getOperand(1));
		if (MetaThresholdValue) {
		// We will also use the supplied value for PartialThreshold for now.
		// We may introduce additional metadata if it becomes necessary in the
		// future.
		UP.Threshold = MetaThresholdValue->getSExtValue();
		UP.PartialThreshold = UP.Threshold;
		ThresholdPrivate = std::min(ThresholdPrivate, UP.Threshold);
		ThresholdLocal = std::min(ThresholdLocal, UP.Threshold);
		}
		}
		}

unsigned MaxBoost = std::max(ThresholdPrivate, ThresholdLocal);		unsigned MaxBoost = std::max(ThresholdPrivate, ThresholdLocal);
for (const BasicBlock *BB : L->getBlocks()) {		for (const BasicBlock *BB : L->getBlocks()) {
const DataLayout &DL = BB->getModule()->getDataLayout();		const DataLayout &DL = BB->getModule()->getDataLayout();
unsigned LocalGEPsSeen = 0;		unsigned LocalGEPsSeen = 0;

if (llvm::any_of(L->getSubLoops(), [BB](const Loop* SubLoop) {		if (llvm::any_of(L->getSubLoops(), [BB](const Loop* SubLoop) {
return SubLoop->contains(BB); }))		return SubLoop->contains(BB); }))
continue; // Block belongs to an inner loop.		continue; // Block belongs to an inner loop.
▲ Show 20 Lines • Show All 1,019 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopUnroll/AMDGPU/unroll-threshold.ll

This file was added.

				; RUN: opt < %s -S -mtriple=amdgcn-- -loop-unroll \| FileCheck %s

				; Check the handling of llvm.loop.unroll.threshold metadata which can be used to
				; set the default threshold for a loop. This metadata overrides both the AMDGPU
				; default, and any value specified by the amdgpu-unroll-threshold function attribute
				; (which sets a threshold for all loops in the function).

				; Check that the loop in unroll_default is not fully unrolled using the default
				; unroll threshold
				; CHECK-LABEL: @unroll_default
				; CHECK: entry:
				; CHECK: br i1 %cmp
				; CHECK: ret void

				@in = internal unnamed_addr global i32* null, align 8
				@out = internal unnamed_addr global i32* null, align 8

				define void @unroll_default() {
				entry:
				br label %do.body

				do.body: ; preds = %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %do.body ]
				%v1 = load i64, i64* bitcast (i32** @in to i64*), align 8
				store i64 %v1, i64* bitcast (i32** @out to i64*), align 8
				%inc = add nsw i32 %i.0, 1
				%cmp = icmp slt i32 %inc, 100
				br i1 %cmp, label %do.body, label %do.end

				do.end: ; preds = %do.body
				ret void
				}

				; Check that the same loop in unroll_full is fully unrolled when the default
				; unroll threshold is increased by use of the llvm.loop.unroll.threshold metadata
				; CHECK-LABEL: @unroll_full
				; CHECK: entry:
				; CHECK-NOT: br i1 %cmp
				; CHECK: ret void

				define void @unroll_full() {
				entry:
				br label %do.body

				do.body: ; preds = %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %do.body ]
				%v1 = load i64, i64* bitcast (i32** @in to i64*), align 8
				store i64 %v1, i64* bitcast (i32** @out to i64*), align 8
				%inc = add nsw i32 %i.0, 1
				%cmp = icmp slt i32 %inc, 100
				br i1 %cmp, label %do.body, label %do.end, !llvm.loop !1

				do.end: ; preds = %do.body
				ret void
				}

				; Check that the same loop in override_no_unroll is not unrolled when a high default
				; unroll threshold specified using the amdgpu-unroll-threshold function attribute
				; is overridden by a low threshold using the llvm.loop.unroll.threshold metadata

				; CHECK-LABEL: @override_no_unroll
				; CHECK: entry:
				; CHECK: br i1 %cmp
				; CHECK: ret void

				define void @override_no_unroll() #0 {
				entry:
				br label %do.body

				do.body: ; preds = %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %do.body ]
				%v1 = load i64, i64* bitcast (i32** @in to i64*), align 8
				store i64 %v1, i64* bitcast (i32** @out to i64*), align 8
				%inc = add nsw i32 %i.0, 1
				%cmp = icmp slt i32 %inc, 100
				br i1 %cmp, label %do.body, label %do.end, !llvm.loop !3

				do.end: ; preds = %do.body
				ret void
				}

				; Check that the same loop in override_unroll is fully unrolled when a low default
				; unroll threshold specified using the amdgpu-unroll-threshold function attribute
				; is overridden by a high threshold using the llvm.loop.unroll.threshold metadata

				; CHECK-LABEL: @override_unroll
				; CHECK: entry:
				; CHECK-NOT: br i1 %cmp
				; CHECK: ret void

				define void @override_unroll() #1 {
				entry:
				br label %do.body

				do.body: ; preds = %entry
				%i.0 = phi i32 [ 0, %entry ], [ %inc, %do.body ]
				%v1 = load i64, i64* bitcast (i32** @in to i64*), align 8
				store i64 %v1, i64* bitcast (i32** @out to i64*), align 8
				%inc = add nsw i32 %i.0, 1
				%cmp = icmp slt i32 %inc, 100
				br i1 %cmp, label %do.body, label %do.end, !llvm.loop !1

				do.end: ; preds = %do.body
				ret void
				}

				attributes #0 = { "amdgpu-unroll-threshold"="1000" }
				attributes #1 = { "amdgpu-unroll-threshold"="100" }

				!1 = !{!1, !2}
				!2 = !{!"llvm.loop.unroll.threshold", i32 1000}
				!3 = !{!3, !4}
				!4 = !{!"llvm.loop.unroll.threshold", i32 100}