This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
14/14
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
assume.ll
4/4
scalable-assume.ll
-
scalable-lifetime.ll
-
scalable-noalias-scope-decl.ll

Differential D107284

[LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform
ClosedPublic

Authored by david-arm on Aug 2 2021, 8:13 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
kmclaughlin
fhahn
spatel
nikic

Commits

rG3fd96e1b2e12: [LoopVectorize] Improve vectorisation of some intrinsics by treating them as…
rG95800da91493: [LoopVectorize] Add support for replication of more intrinsics with scalable…

Summary

This patch adds more instructions to the Uniforms list, for example certain
intrinsics that are uniform by definition or whose operands are loop invariant.
This list includes:

The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which are always uniform by definition.
If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have loop invariant input operands then these are also uniform too.

Also, in VPRecipeBuilder::handleReplication we check if an instruction is
uniform based purely on whether or not the instruction lives in the Uniforms
list. However, there are certain cases where calls to some intrinsics can
be effectively treated as uniform too. Therefore, we now also treat the
following cases as uniform for scalable vectors:

If the 'assume' intrinsic's operand is not loop invariant, then we are free to treat this as uniform anyway since it's only a performance hint. We will get the benefit for the first lane.
When the input pointers for 'lifetime.start' and 'lifetime.end' are loop variant then for scalable vectors we assume these still ultimately come from the broadcast of an alloca. We do not support scalable vectorisation of loops containing alloca instructions, hence the alloca itself would be invariant. If the pointer does not come from an alloca then the intrinsic itself has no effect.

I have updated the assume test for fixed width, since we now treat it
as uniform:

Transforms/LoopVectorize/assume.ll

I've also added new scalable vectorisation tests for other intriniscs:

Transforms/LoopVectorize/scalable-assume.ll
Transforms/LoopVectorize/scalable-lifetime.ll
Transforms/LoopVectorize/scalable-noalias-scope-decl.ll

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

david-arm created this revision.Aug 2 2021, 8:13 AM

Herald added subscribers: rogfer01, hiraditya. · View Herald TranscriptAug 2 2021, 8:13 AM

david-arm requested review of this revision.Aug 2 2021, 8:13 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2021, 8:13 AM

Herald added subscribers: llvm-commits, vkmr. · View Herald Transcript

Harbormaster completed remote builds in B117457: Diff 363482.Aug 2 2021, 8:13 AM

All the scalable-xyz tests were just copied from the original fixed-width xyz tests and adapted for scalable vectors.

david-arm added a parent revision: D107157: [NFC] Clean up tests in test/Transforms/LoopVectorize/assume.ll.Aug 2 2021, 8:48 AM

sdesmalen added inline comments.Aug 3 2021, 4:03 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8951	Rather than having a switch within a switch, would it make sense to just bail out with `false` if I is not a CallInst? Or one step further, specialize the function to `isUniformIntrinsicCall`, which takes a `IntrinsicInst` instead of `Instruction`?
8954	nit: The switch is missing a default case, which may result in build warnings.
8961	It would be nice if there would be a better way to determine if the operands are uniform (e.g. if the pointer has a bitcast in the loop of a value that's otherwise loop-invariant, that could also be considered uniform), other than testing it like you do here. Maybe you could additionally check if the operands are either uniform or scalar, but I'm not sure if we care (enough), since in practically all cases the operands will be loop-invariant. nit: how about writing this expression as: return IsScalable \|\| OrigLoop->hasLoopInvariantOperands(I);
8970–8972	I guess the assume could be widened and the lanes could be `llvm.reduce.or`'ed together, but I doubt that LLVM will be able to use that knowledge in practice. I think dropping information about the other lanes is acceptable, and still an improvement of not vectorizing at all or having the compiler crash while trying to scalarize :)
8975–8976	as we don't support vectorizing allocas for scalable vectors. I don't think this is the argument for why we assume the value to be uniform. The `alloca` could be outside the loop, and it could only be the pointers that are not loop-invariant. It's more that any non-uniform pointers probably won't be recognized by other passes, since ValueTracking's `findAllocaForValue` only looks through simple bitcasts/inttoptr/ptrtoint/phi instructions, the intrinsic is ignored for anything more complicated when it can't determine the alloca.

Matt added a subscriber: Matt.Aug 3 2021, 6:55 AM

Addressed some review comments.

david-arm marked 3 inline comments as done.Aug 3 2021, 7:22 AM

Harbormaster completed remote builds in B117632: Diff 363732.Aug 3 2021, 7:49 AM

Addressed review comment by renaming function to isUniformIntrinsicCall

fhahn added inline comments.Aug 4 2021, 12:57 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8961	can you add a comment here saying why we need to do another check?
8968	`may` here makes it sound like it is not clear if it is better. The patch/code decides it is better, perhaps make that clear in the wording?
8973	Can you keep the reasoning along here in line with `LangRef`? There's no reference to `alloca` there, just stack objects. Also, if it is not a stack object, the intrinsic still has an effect: poisoning the object. This should still allow to remove the call.
llvm/lib/Transforms/Vectorize/VPRecipeBuilder.h
176 ↗	(On Diff #363986)	this only supports intrinsics, can it take `IntrinsicInst`?
llvm/test/Transforms/LoopVectorize/scalable-assume.ll
90	is this needed?(same for some other tests)
93	is this needed_
97	is this needed?

david-arm added inline comments.Aug 4 2021, 1:00 AM

llvm/test/Transforms/LoopVectorize/scalable-assume.ll
90	Maybe not? I didn't write these tests, but copied them from assume.ll. I can clean them up though.

Harbormaster completed remote builds in B117827: Diff 363986.Aug 4 2021, 1:09 AM

Addressed review comments - tidied up tests, renamed new function, improved comments to be more inline with the LangRef.

david-arm marked 7 inline comments as done.Aug 4 2021, 5:10 AM

Harbormaster completed remote builds in B117877: Diff 364063.Aug 4 2021, 5:25 AM

sdesmalen added inline comments.Aug 4 2021, 6:14 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8952–8953	nit: I think this can simply be: switch (CI->getIntrinsicID()) {
8975	Perhaps not something to change in this patch, but given the change I've made in D107286, I wonder if it makes sense to mark any operation that has only loop-invariant operands as 'uniform' in `collectLoopUniforms`, e.g. if (all_of(I.operands(), [&](Value *V) { return isOutOfScope(V); })) addToWorklistIfAllowed(&I); @fhahn any thoughts on this? As a smaller first step, you could do the above suggestion only for this limited set of intrinsics (and then override IsUniform in handleReplication for this set of intrinsics if the VF is scalable). That said, I don't really want to hold up this patch too much given that it fixes the issue sufficiently, and it's the last piece of the puzzle to build LNT and Clang with scalable vectorization enabled. So maybe such refactoring can be done in a follow-up patch?

sdesmalen mentioned this in D107436: [Zorg] Add AArch64 SVE Vector-Length-Agnostic bots.Aug 4 2021, 6:31 AM

Changed the patch to add some intrinsics to the Uniforms list where possible and inlined parts of isUniformIntrinsicCall into handleReplication instead.

Thanks @david-arm. LGTM with nits addressed!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5431–5432	nit: this can be: switch (II->getIntrinsicID()) { There is no need for `getVectorIntrinsicIDForCall`
8964–8965	nit: same comment about using `cast<IntrinsicInst>(I)->getIntrinsicID()`
8984	nit: maybe move this condition to the if-statement, and set `IsUniform = true` directly?

This revision is now accepted and ready to land.Aug 4 2021, 9:15 AM

fhahn added inline comments.Aug 4 2021, 9:54 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8963	Should this directly be guarded by `isScalable`? Would also be good to mention that this inly applies to scalable vectors and why it is needed up front here.

Harbormaster completed remote builds in B117937: Diff 364151.Aug 4 2021, 10:07 AM

Addressed review comments.

david-arm marked 8 inline comments as done.Aug 5 2021, 1:38 AM

Harbormaster completed remote builds in B118100: Diff 364381.Aug 5 2021, 2:30 AM

This revision was landed with ongoing or failed builds.Aug 5 2021, 7:17 AM

Closed by commit rG95800da91493: [LoopVectorize] Add support for replication of more intrinsics with scalable… (authored by david-arm). · Explain Why

This revision was automatically updated to reflect the committed changes.

david-arm added a commit: rG95800da91493: [LoopVectorize] Add support for replication of more intrinsics with scalable….

LGTM, thanks. Please address the pre-merge lint issue and also make sure the title + description of the patch match the latest version of the code; at the moment it is not in sync with the code.

In D107284#2928517, @fhahn wrote:

LGTM, thanks. Please address the pre-merge lint issue and also make sure the title + description of the patch match the latest version of the code; at the moment it is not in sync with the code.

To be more specific, the title is not accurate because it implies more instructions are replicated, when in fact it does the opposite; it does not require replicating certain calls because it treats them as uniform. It also is not limited to scalable vectors.

david-arm added a reverting change: rG43a5c750d183: Revert "[LoopVectorize] Add support for replication of more intrinsics with….Aug 6 2021, 1:48 AM

david-arm added a commit: rG3fd96e1b2e12: [LoopVectorize] Improve vectorisation of some intrinsics by treating them as….Aug 6 2021, 2:13 AM

david-arm retitled this revision from [LoopVectorize] Add support for replication of more intrinsics with scalable vectors to [LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform.Aug 6 2021, 2:14 AM

david-arm mentioned this in D107749: Backport of D107284 to LLVM 13 branch.Aug 9 2021, 1:53 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

44 lines

test/

Transforms/

LoopVectorize/

assume.ll

4 lines

	scalable-assume.ll
	assume.ll

86 lines

scalable-lifetime.ll

81 lines

scalable-noalias-scope-decl.ll

127 lines

Diff 364457

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,421 Lines • ▼ Show 20 Lines	void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {
// it does not imply that all lanes produce the same value (e.g. this is not		// it does not imply that all lanes produce the same value (e.g. this is not
// the usual meaning of uniform)		// the usual meaning of uniform)
SetVector<Value *> HasUniformUse;		SetVector<Value *> HasUniformUse;

// Scan the loop for instructions which are either a) known to have only		// Scan the loop for instructions which are either a) known to have only
// lane 0 demanded or b) are uses which demand only lane 0 of their operand.		// lane 0 demanded or b) are uses which demand only lane 0 of their operand.
for (auto *BB : TheLoop->blocks())		for (auto *BB : TheLoop->blocks())
for (auto &I : *BB) {		for (auto &I : *BB) {
		if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(&I)) {
		switch (II->getIntrinsicID()) {
		case Intrinsic::sideeffect:
		sdesmalenUnsubmitted Done Reply Inline Actions nit: this can be: switch (II->getIntrinsicID()) { There is no need for `getVectorIntrinsicIDForCall` sdesmalen: nit: this can be: switch (II->getIntrinsicID()) { There is no need for…
		case Intrinsic::experimental_noalias_scope_decl:
		case Intrinsic::assume:
		case Intrinsic::lifetime_start:
		case Intrinsic::lifetime_end:
		if (TheLoop->hasLoopInvariantOperands(&I))
		addToWorklistIfAllowed(&I);
		default:
		break;
		}
		}

// If there's no pointer operand, there's nothing to do.		// If there's no pointer operand, there's nothing to do.
auto *Ptr = getLoadStorePointerOperand(&I);		auto *Ptr = getLoadStorePointerOperand(&I);
if (!Ptr)		if (!Ptr)
continue;		continue;

// A uniform memory op is itself uniform. We exclude uniform stores		// A uniform memory op is itself uniform. We exclude uniform stores
// here as they demand the last lane, not the first one.		// here as they demand the last lane, not the first one.
if (isa<LoadInst>(I) && Legal->isUniformMemOp(I))		if (isa<LoadInst>(I) && Legal->isUniformMemOp(I))
▲ Show 20 Lines • Show All 3,491 Lines • ▼ Show 20 Lines	for (VPWidenPHIRecipe *R : PhisToFix) {
auto *PN = cast<PHINode>(R->getUnderlyingValue());		auto *PN = cast<PHINode>(R->getUnderlyingValue());
VPRecipeBase *IncR =		VPRecipeBase *IncR =
getRecipe(cast<Instruction>(PN->getIncomingValueForBlock(OrigLatch)));		getRecipe(cast<Instruction>(PN->getIncomingValueForBlock(OrigLatch)));
R->addOperand(IncR->getVPSingleValue());		R->addOperand(IncR->getVPSingleValue());
}		}
}		}

VPBasicBlock *VPRecipeBuilder::handleReplication(		VPBasicBlock *VPRecipeBuilder::handleReplication(
Instruction I, VFRange &Range, VPBasicBlock VPBB,		Instruction I, VFRange &Range, VPBasicBlock VPBB,
		sdesmalenUnsubmitted Done Reply Inline Actions Rather than having a switch within a switch, would it make sense to just bail out with `false` if I is not a CallInst? Or one step further, specialize the function to `isUniformIntrinsicCall`, which takes a `IntrinsicInst` instead of `Instruction`? sdesmalen: Rather than having a switch within a switch, would it make sense to just bail out with `false`…
VPlanPtr &Plan) {		VPlanPtr &Plan) {
bool IsUniform = LoopVectorizationPlanner::getDecisionAndClampRange(		bool IsUniform = LoopVectorizationPlanner::getDecisionAndClampRange(
		sdesmalenUnsubmitted Done Reply Inline Actions nit: I think this can simply be: switch (CI->getIntrinsicID()) { sdesmalen: nit: I think this can simply be: switch (CI->getIntrinsicID()) {
[&](ElementCount VF) { return CM.isUniformAfterVectorization(I, VF); },		[&](ElementCount VF) { return CM.isUniformAfterVectorization(I, VF); },
		sdesmalenUnsubmitted Done Reply Inline Actions nit: The switch is missing a default case, which may result in build warnings. sdesmalen: nit: The switch is missing a default case, which may result in build warnings.
Range);		Range);

bool IsPredicated = LoopVectorizationPlanner::getDecisionAndClampRange(		bool IsPredicated = LoopVectorizationPlanner::getDecisionAndClampRange(
[&](ElementCount VF) { return CM.isPredicatedInst(I); }, Range);		[&](ElementCount VF) { return CM.isPredicatedInst(I); }, Range);

		// Even if the instruction is not marked as uniform, there are certain
		// intrinsic calls that can be effectively treated as such, so we check for
		sdesmalenUnsubmitted Done Reply Inline Actions It would be nice if there would be a better way to determine if the operands are uniform (e.g. if the pointer has a bitcast in the loop of a value that's otherwise loop-invariant, that could also be considered uniform), other than testing it like you do here. Maybe you could additionally check if the operands are either uniform or scalar, but I'm not sure if we care (enough), since in practically all cases the operands will be loop-invariant. nit: how about writing this expression as: return IsScalable \|\| OrigLoop->hasLoopInvariantOperands(I); sdesmalen: It would be nice if there would be a better way to determine if the operands are uniform (e.g.
		fhahnUnsubmitted Done Reply Inline Actions can you add a comment here saying why we need to do another check? fhahn: can you add a comment here saying why we need to do another check?
		// them here. Conservatively, we only do this for scalable vectors, since
		// for fixed-width VFs we can always fall back on full scalarization.
		fhahnUnsubmitted Done Reply Inline Actions Should this directly be guarded by `isScalable`? Would also be good to mention that this inly applies to scalable vectors and why it is needed up front here. fhahn: Should this directly be guarded by `isScalable`? Would also be good to mention that this inly…
		if (!IsUniform && Range.Start.isScalable() && isa<IntrinsicInst>(I)) {
		switch (cast<IntrinsicInst>(I)->getIntrinsicID()) {
		sdesmalenUnsubmitted Done Reply Inline Actions nit: same comment about using `cast<IntrinsicInst>(I)->getIntrinsicID()` sdesmalen: nit: same comment about using `cast<IntrinsicInst>(I)->getIntrinsicID()`
		case Intrinsic::assume:
		case Intrinsic::lifetime_start:
		case Intrinsic::lifetime_end:
		fhahnUnsubmitted Done Reply Inline Actions `may` here makes it sound like it is not clear if it is better. The patch/code decides it is better, perhaps make that clear in the wording? fhahn: `may` here makes it sound like it is not clear if it is better. The patch/code decides it is…
		// For scalable vectors if one of the operands is variant then we still
		// want to mark as uniform, which will generate one instruction for just
		// the first lane of the vector. We can't scalarize the call in the same
		// way as for fixed-width vectors because we don't know how many lanes
		sdesmalenUnsubmitted Done Reply Inline Actions I guess the assume could be widened and the lanes could be `llvm.reduce.or`'ed together, but I doubt that LLVM will be able to use that knowledge in practice. I think dropping information about the other lanes is acceptable, and still an improvement of not vectorizing at all or having the compiler crash while trying to scalarize :) sdesmalen: I guess the assume //could// be widened and the lanes could be `llvm.reduce.or`'ed together…
		// there are.
		fhahnUnsubmitted Done Reply Inline Actions Can you keep the reasoning along here in line with `LangRef`? There's no reference to `alloca` there, just stack objects. Also, if it is not a stack object, the intrinsic still has an effect: poisoning the object. This should still allow to remove the call. fhahn: Can you keep the reasoning along here in line with `LangRef`? There's no reference to `alloca`…
		//
		// The reasons for doing it this way for scalable vectors are:
		sdesmalenUnsubmitted Done Reply Inline Actions Perhaps not something to change in this patch, but given the change I've made in D107286, I wonder if it makes sense to mark any operation that has only loop-invariant operands as 'uniform' in `collectLoopUniforms`, e.g. if (all_of(I.operands(), [&](Value V) { return isOutOfScope(V); })) addToWorklistIfAllowed(&I); @fhahn any thoughts on this? As a smaller first step, you could do the above suggestion only for this limited set of intrinsics (and then override IsUniform in handleReplication for this set of intrinsics if the VF is scalable). That said, I don't really want to hold up this patch too much given that it fixes the issue sufficiently, and it's the last piece of the puzzle to build LNT and Clang with scalable vectorization enabled. So maybe such refactoring can be done in a follow-up patch? sdesmalen:* Perhaps not something to change in this patch, but given the change I've made in D107286, I…
		// 1. For the assume intrinsic generating the instruction for the first
		sdesmalenUnsubmitted Done Reply Inline Actions as we don't support vectorizing allocas for scalable vectors. I don't think this is the argument for why we assume the value to be uniform. The `alloca` could be outside the loop, and it could only be the pointers that are not loop-invariant. It's more that any non-uniform pointers probably won't be recognized by other passes, since ValueTracking's `findAllocaForValue` only looks through simple bitcasts/inttoptr/ptrtoint/phi instructions, the intrinsic is ignored for anything more complicated when it can't determine the alloca. sdesmalen: > as we don't support vectorizing allocas for scalable vectors. I don't think this is the…
		// lane is still be better than not generating any at all. For
		// example, the input may be a splat across all lanes.
		// 2. For the lifetime start/end intrinsics the pointer operand only
		// does anything useful when the input comes from a stack object,
		// which suggests it should always be uniform. For non-stack objects
		// the effect is to poison the object, which still allows us to
		// remove the call.
		IsUniform = true;
		sdesmalenUnsubmitted Done Reply Inline Actions nit: maybe move this condition to the if-statement, and set `IsUniform = true` directly? sdesmalen: nit: maybe move this condition to the if-statement, and set `IsUniform = true` directly?
		default:
		break;
		}
		}

auto *Recipe = new VPReplicateRecipe(I, Plan->mapToVPValues(I->operands()),		auto *Recipe = new VPReplicateRecipe(I, Plan->mapToVPValues(I->operands()),
IsUniform, IsPredicated);		IsUniform, IsPredicated);
setRecipe(I, Recipe);		setRecipe(I, Recipe);
Plan->addVPValue(I, Recipe);		Plan->addVPValue(I, Recipe);

// Find if I uses a predicated instruction. If so, it will use its scalar		// Find if I uses a predicated instruction. If so, it will use its scalar
// value. Avoid hoisting the insert-element which packs the scalar value into		// value. Avoid hoisting the insert-element which packs the scalar value into
// a vector value, as that happens iff all users use the vector value.		// a vector value, as that happens iff all users use the vector value.
▲ Show 20 Lines • Show All 1,544 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/assume.ll

This file was copied to llvm/test/Transforms/LoopVectorize/scalable-assume.ll.

	Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	define void @test2(%struct.data* nocapture readonly %d) {			define void @test2(%struct.data* nocapture readonly %d) {
	; CHECK-LABEL: @test2(			; CHECK-LABEL: @test2(
	; CHECK: entry:			; CHECK: entry:
	; CHECK: [[MASKCOND:%.*]] = icmp eq i64 %maskedptr, 0			; CHECK: [[MASKCOND:%.*]] = icmp eq i64 %maskedptr, 0
	; CHECK: [[MASKCOND4:%.*]] = icmp eq i64 %maskedptr3, 0			; CHECK: [[MASKCOND4:%.*]] = icmp eq i64 %maskedptr3, 0
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: tail call void @llvm.assume(i1 [[MASKCOND]])			; CHECK: tail call void @llvm.assume(i1 [[MASKCOND]])
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND]])
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND]])
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND]])
	; CHECK: tail call void @llvm.assume(i1 [[MASKCOND4]])			; CHECK: tail call void @llvm.assume(i1 [[MASKCOND4]])
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND4]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND4]])
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND4]])
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND4]])
	; CHECK: for.body:			; CHECK: for.body:
	entry:			entry:
	%b = getelementptr inbounds %struct.data, %struct.data* %d, i64 0, i32 1			%b = getelementptr inbounds %struct.data, %struct.data* %d, i64 0, i32 1
	%0 = load float, float* %b, align 8			%0 = load float, float* %b, align 8
	%ptrint = ptrtoint float* %0 to i64			%ptrint = ptrtoint float* %0 to i64
	%maskedptr = and i64 %ptrint, 31			%maskedptr = and i64 %ptrint, 31
	%maskcond = icmp eq i64 %maskedptr, 0			%maskcond = icmp eq i64 %maskedptr, 0
	%a = getelementptr inbounds %struct.data, %struct.data* %d, i64 0, i32 0			%a = getelementptr inbounds %struct.data, %struct.data* %d, i64 0, i32 0
	▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/scalable-assume.ll

This file was copied from llvm/test/Transforms/LoopVectorize/assume.ll.

	; RUN: opt < %s -loop-vectorize -force-vector-width=2 -force-vector-interleave=2 -S \| FileCheck %s			; RUN: opt < %s -scalable-vectorization=on -force-target-supports-scalable-vectors=true -loop-vectorize -force-vector-width=2 -force-vector-interleave=2 -S \| FileCheck %s

	define void @test1(float* noalias nocapture %a, float* noalias nocapture readonly %b) {			define void @test1(float* noalias nocapture %a, float* noalias nocapture readonly %b) {
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: [[WIDE_LOAD:%.]] = load <2 x float>, <2 x float> {{.*}}, align 4			; CHECK: [[FCMP1:%.*]] = fcmp ogt <vscale x 2 x float>
	; CHECK: [[WIDE_LOAD1:%.]] = load <2 x float>, <2 x float> {{.*}}, align 4			; CHECK-NEXT: [[FCMP2:%.*]] = fcmp ogt <vscale x 2 x float>
	; CHECK-NEXT: [[TMP1:%.*]] = fcmp ogt <2 x float> [[WIDE_LOAD]], <float 1.000000e+02, float 1.000000e+02>			; CHECK-NEXT: [[FCMP1L0:%.*]] = extractelement <vscale x 2 x i1> [[FCMP1]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = fcmp ogt <2 x float> [[WIDE_LOAD1]], <float 1.000000e+02, float 1.000000e+02>			; CHECK-NEXT: tail call void @llvm.assume(i1 [[FCMP1L0]])
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x i1> [[TMP1]], i32 0			; CHECK-NEXT: [[FCMP2L0:%.*]] = extractelement <vscale x 2 x i1> [[FCMP2]], i32 0
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP3]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[FCMP2L0]])
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i1> [[TMP1]], i32 1
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP4]])
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i1> [[TMP2]], i32 0
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP5]])
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i1> [[TMP2]], i32 1
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[TMP6]])
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds float, float* %b, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %b, i64 %indvars.iv
	%0 = load float, float* %arrayidx, align 4			%0 = load float, float* %arrayidx, align 4
	%cmp1 = fcmp ogt float %0, 1.000000e+02			%cmp1 = fcmp ogt float %0, 1.000000e+02
	tail call void @llvm.assume(i1 %cmp1)			tail call void @llvm.assume(i1 %cmp1)
	%add = fadd float %0, 1.000000e+00			%add = fadd float %0, 1.000000e+00
	%arrayidx5 = getelementptr inbounds float, float* %a, i64 %indvars.iv			%arrayidx5 = getelementptr inbounds float, float* %a, i64 %indvars.iv
	store float %add, float* %arrayidx5, align 4			store float %add, float* %arrayidx5, align 4
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv, 1599			%exitcond = icmp eq i64 %indvars.iv, 1599
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	declare void @llvm.assume(i1) #0			declare void @llvm.assume(i1) #0

	attributes #0 = { nounwind willreturn }			attributes #0 = { nounwind willreturn }

	%struct.data = type { float, float }			%struct.data = type { float, float }

	define void @test2(%struct.data* nocapture readonly %d) {			define void @test2(float %a, float %b) {
	; CHECK-LABEL: @test2(			; CHECK-LABEL: @test2(
	; CHECK: entry:			; CHECK: entry:
	; CHECK: [[MASKCOND:%.*]] = icmp eq i64 %maskedptr, 0			; CHECK: [[MASKCOND:%.*]] = icmp eq i64 %ptrint1, 0
	; CHECK: [[MASKCOND4:%.*]] = icmp eq i64 %maskedptr3, 0			; CHECK: [[MASKCOND4:%.*]] = icmp eq i64 %ptrint2, 0
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK: tail call void @llvm.assume(i1 [[MASKCOND]])			; CHECK: tail call void @llvm.assume(i1 [[MASKCOND]])
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND]])
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND]])
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND]])
	; CHECK: tail call void @llvm.assume(i1 [[MASKCOND4]])			; CHECK: tail call void @llvm.assume(i1 [[MASKCOND4]])
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND4]])			; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND4]])
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND4]])
	; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND4]])
	; CHECK: for.body:
	entry:			entry:
	%b = getelementptr inbounds %struct.data, %struct.data* %d, i64 0, i32 1			%ptrint1 = ptrtoint float* %a to i64
	%0 = load float, float* %b, align 8			%maskcond = icmp eq i64 %ptrint1, 0
	%ptrint = ptrtoint float* %0 to i64			%ptrint2 = ptrtoint float* %b to i64
	%maskedptr = and i64 %ptrint, 31			%maskcond4 = icmp eq i64 %ptrint2, 0
	%maskcond = icmp eq i64 %maskedptr, 0
	%a = getelementptr inbounds %struct.data, %struct.data* %d, i64 0, i32 0
	%1 = load float, float* %a, align 8
	%ptrint2 = ptrtoint float* %1 to i64
	%maskedptr3 = and i64 %ptrint2, 31
	%maskcond4 = icmp eq i64 %maskedptr3, 0
	br label %for.body			br label %for.body


	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	tail call void @llvm.assume(i1 %maskcond)			tail call void @llvm.assume(i1 %maskcond)
	%arrayidx = getelementptr inbounds float, float* %0, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %a, i64 %indvars.iv
	%2 = load float, float* %arrayidx, align 4			%0 = load float, float* %arrayidx, align 4
	%add = fadd float %2, 1.000000e+00			%add = fadd float %0, 1.000000e+00
	tail call void @llvm.assume(i1 %maskcond4)			tail call void @llvm.assume(i1 %maskcond4)
	%arrayidx5 = getelementptr inbounds float, float* %1, i64 %indvars.iv			%arrayidx5 = getelementptr inbounds float, float* %b, i64 %indvars.iv
	store float %add, float* %arrayidx5, align 4			store float %add, float* %arrayidx5, align 4
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv, 1599			%exitcond = icmp eq i64 %indvars.iv, 1599
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	; Test case for PR43620. Make sure we can vectorize with predication in presence			; Test case for PR43620. Make sure we can vectorize with predication in presence
	; of assume calls. For now, check that we drop all assumes in predicated blocks			; of assume calls. For now, check that we drop all assumes in predicated blocks
	; in the vector body.			; in the vector body.
	define void @predicated_assume(float* noalias nocapture readonly %a, float* noalias nocapture %b, i32 %n) {			define void @predicated_assume(float* noalias nocapture readonly %a, float* noalias nocapture %b, i64 %n) {
	; Check that the vector.body does not contain any assumes.			; Check that the vector.body does not contain any assumes.
	; CHECK-LABEL: @predicated_assume(			; CHECK-LABEL: @predicated_assume(
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NOT: llvm.assume			; CHECK-NOT: llvm.assume
	; CHECK: for.body:			; CHECK: for.body:
	entry:			entry:
	%cmp15 = icmp eq i32 %n, 0
	br i1 %cmp15, label %for.cond.cleanup, label %for.body.preheader

	for.body.preheader: ; preds = %entry
	%0 = zext i32 %n to i64
	br label %for.body			br label %for.body

	for.cond.cleanup.loopexit: ; preds = %if.end5
	br label %for.cond.cleanup

	for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
	ret void

	for.body: ; preds = %for.body.preheader, %if.end5			for.body: ; preds = %for.body.preheader, %if.end5
	%indvars.iv = phi i64 [ 0, %for.body.preheader ], [ %indvars.iv.next, %if.end5 ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %if.end5 ]
	%cmp1 = icmp ult i64 %indvars.iv, 495616			%cmp1 = icmp ult i64 %indvars.iv, 495616
	br i1 %cmp1, label %if.end5, label %if.else			br i1 %cmp1, label %if.end5, label %if.else

	if.else: ; preds = %for.body			if.else: ; preds = %for.body
				fhahnUnsubmitted Done Reply Inline Actions is this needed?(same for some other tests) fhahn: is this needed?(same for some other tests)
				david-armAuthorUnsubmitted Done Reply Inline Actions Maybe not? I didn't write these tests, but copied them from assume.ll. I can clean them up though. david-arm: Maybe not? I didn't write these tests, but copied them from assume.ll. I can clean them up…
	%cmp2 = icmp ult i64 %indvars.iv, 991232			%cmp2 = icmp ult i64 %indvars.iv, 991232
	tail call void @llvm.assume(i1 %cmp2)			tail call void @llvm.assume(i1 %cmp2)
	br label %if.end5			br label %if.end5
				fhahnUnsubmitted Done Reply Inline Actions is this needed_ fhahn: is this needed_

	if.end5: ; preds = %for.body, %if.else			if.end5: ; preds = %for.body, %if.else
	%x.0 = phi float [ 4.200000e+01, %if.else ], [ 2.300000e+01, %for.body ]			%x.0 = phi float [ 4.200000e+01, %if.else ], [ 2.300000e+01, %for.body ]
	%arrayidx = getelementptr inbounds float, float* %a, i64 %indvars.iv			%arrayidx = getelementptr inbounds float, float* %a, i64 %indvars.iv
				fhahnUnsubmitted Done Reply Inline Actions is this needed? fhahn: is this needed?
	%1 = load float, float* %arrayidx, align 4			%0 = load float, float* %arrayidx, align 4
	%mul = fmul float %x.0, %1			%mul = fmul float %x.0, %0
	%arrayidx7 = getelementptr inbounds float, float* %b, i64 %indvars.iv			%arrayidx7 = getelementptr inbounds float, float* %b, i64 %indvars.iv
	store float %mul, float* %arrayidx7, align 4			store float %mul, float* %arrayidx7, align 4
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%cmp = icmp eq i64 %indvars.iv.next, %0			%cmp = icmp eq i64 %indvars.iv.next, %n
	br i1 %cmp, label %for.cond.cleanup.loopexit, label %for.body			br i1 %cmp, label %for.cond.cleanup, label %for.body, !llvm.loop !0

				for.cond.cleanup: ; preds = %if.end5, %entry
				ret void
	}			}

				!0 = distinct !{!0, !1}
				!1 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

llvm/test/Transforms/LoopVectorize/scalable-lifetime.ll

This file was added.

				; RUN: opt -S -scalable-vectorization=on -force-target-supports-scalable-vectors=true -loop-vectorize -force-vector-width=2 -force-vector-interleave=1 < %s \| FileCheck %s

				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

				; Make sure we can vectorize loops which contain lifetime markers.

				define void @test(i32 *%d) {
				; CHECK-LABEL: @test(
				; CHECK: entry:
				; CHECK: [[ALLOCA:%.*]] = alloca [1024 x i32], align 16
				; CHECK-NEXT: [[BC:%.]] = bitcast [1024 x i32] [[ALLOCA]] to i8*
				; CHECK: vector.body:
				; CHECK: call void @llvm.lifetime.end.p0i8(i64 4096, i8* [[BC]])
				; CHECK: store <vscale x 2 x i32>
				; CHECK: call void @llvm.lifetime.start.p0i8(i64 4096, i8* [[BC]])

				entry:
				%arr = alloca [1024 x i32], align 16
				%0 = bitcast [1024 x i32]* %arr to i8*
				call void @llvm.lifetime.start.p0i8(i64 4096, i8* %0) #1
				br label %for.body

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				call void @llvm.lifetime.end.p0i8(i64 4096, i8* %0) #1
				%arrayidx = getelementptr inbounds i32, i32* %d, i64 %indvars.iv
				%1 = load i32, i32* %arrayidx, align 8
				store i32 100, i32* %arrayidx, align 8
				call void @llvm.lifetime.start.p0i8(i64 4096, i8* %0) #1
				%indvars.iv.next = add i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp ne i32 %lftr.wideiv, 128
				br i1 %exitcond, label %for.body, label %for.end, !llvm.loop !0

				for.end:
				call void @llvm.lifetime.end.p0i8(i64 4096, i8* %0) #1
				ret void
				}

				; CHECK-LABEL: @testloopvariant(
				; CHECK: entry:
				; CHECK: [[ALLOCA:%.*]] = alloca [1024 x i32], align 16
				; CHECK: vector.ph:
				; CHECK: [[TMP1:%.]] = insertelement <vscale x 2 x [1024 x i32]> poison, [1024 x i32]* %arr, i32 0
				; CHECK-NEXT: [[SPLAT_ALLOCA:%.]] = shufflevector <vscale x 2 x [1024 x i32]> [[TMP1]], <vscale x 2 x [1024 x i32]*> poison, <vscale x 2 x i32> zeroinitializer
				; CHECK: vector.body:
				; CHECK: [[BC_ALLOCA:%.]] = bitcast <vscale x 2 x [1024 x i32]> [[SPLAT_ALLOCA]] to <vscale x 2 x i8*>
				; CHECK-NEXT: [[ONE_LIFETIME:%.]] = extractelement <vscale x 2 x i8> [[BC_ALLOCA]], i32 0
				; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 4096, i8* [[ONE_LIFETIME]])
				; CHECK: store <vscale x 2 x i32>
				; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 4096, i8* [[ONE_LIFETIME]])

				define void @testloopvariant(i32 *%d) {
				entry:
				%arr = alloca [1024 x i32], align 16
				br label %for.body

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%0 = getelementptr [1024 x i32], [1024 x i32]* %arr, i32 0, i64 %indvars.iv
				%1 = bitcast [1024 x i32]* %arr to i8*
				call void @llvm.lifetime.end.p0i8(i64 4096, i8* %1) #1
				%arrayidx = getelementptr inbounds i32, i32* %d, i64 %indvars.iv
				%2 = load i32, i32* %arrayidx, align 8
				store i32 100, i32* %arrayidx, align 8
				call void @llvm.lifetime.start.p0i8(i64 4096, i8* %1) #1
				%indvars.iv.next = add i64 %indvars.iv, 1
				%lftr.wideiv = trunc i64 %indvars.iv.next to i32
				%exitcond = icmp ne i32 %lftr.wideiv, 128
				br i1 %exitcond, label %for.body, label %for.end, !llvm.loop !0

				for.end:
				ret void
				}

				declare void @llvm.lifetime.start.p0i8(i64, i8* nocapture) #1

				declare void @llvm.lifetime.end.p0i8(i64, i8* nocapture) #1

				!0 = distinct !{!0, !1}
				!1 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

llvm/test/Transforms/LoopVectorize/scalable-noalias-scope-decl.ll

This file was added.

				; RUN: opt < %s -scalable-vectorization=on -force-target-supports-scalable-vectors=true -loop-vectorize -force-vector-width=4 -force-vector-interleave=2 -S \| FileCheck %s

				define void @test1(float* noalias nocapture %a, float* noalias nocapture readonly %b) {
				entry:
				br label %for.body

				; CHECK-LABEL: @test1
				; CHECK: vector.body:
				; CHECK: @llvm.experimental.noalias.scope.decl
				; CHECK-NOT: @llvm.experimental.noalias.scope.decl
				; CHECK: for.body:
				; CHECK: @llvm.experimental.noalias.scope.decl
				; CHECK-NOT: @llvm.experimental.noalias.scope.decl
				; CHECK: ret void

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %b, i64 %indvars.iv
				%0 = load float, float* %arrayidx, align 4
				%cmp1 = fcmp ogt float %0, 1.000000e+02
				tail call void @llvm.experimental.noalias.scope.decl(metadata !0)
				%add = fadd float %0, 1.000000e+00
				%arrayidx5 = getelementptr inbounds float, float* %a, i64 %indvars.iv
				store float %add, float* %arrayidx5, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv, 1599
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !5

				for.end: ; preds = %for.body
				ret void
				}

				declare void @llvm.experimental.noalias.scope.decl(metadata)

				%struct.data = type { float, float }

				define void @test2(float* %a, float* %b) {
				; CHECK-LABEL: @test2
				; CHECK: vector.body:
				; CHECK: @llvm.experimental.noalias.scope.decl(metadata [[SCOPE0_LIST:!.*]])
				; CHECK: @llvm.experimental.noalias.scope.decl(metadata [[SCOPE4_LIST:!.*]])
				; CHECK-NOT: @llvm.experimental.noalias.scope.decl
				; CHECK: for.body:
				; CHECK: @llvm.experimental.noalias.scope.decl(metadata [[SCOPE0_LIST]])
				; CHECK: @llvm.experimental.noalias.scope.decl(metadata [[SCOPE4_LIST]])
				; CHECK-NOT: @llvm.experimental.noalias.scope.decl
				; CHECK: ret void
				entry:
				%ptrint = ptrtoint float* %b to i64
				%maskcond = icmp eq i64 %ptrint, 0
				%ptrint2 = ptrtoint float* %a to i64
				%maskcond4 = icmp eq i64 %ptrint2, 0
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				tail call void @llvm.experimental.noalias.scope.decl(metadata !0)
				%arrayidx = getelementptr inbounds float, float* %b, i64 %indvars.iv
				%0 = load float, float* %arrayidx, align 4
				%add = fadd float %0, 1.000000e+00
				tail call void @llvm.experimental.noalias.scope.decl(metadata !4)
				%arrayidx5 = getelementptr inbounds float, float* %a, i64 %indvars.iv
				store float %add, float* %arrayidx5, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv, 1599
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !5

				for.end: ; preds = %for.body
				ret void
				}

				define void @predicated_noalias_scope_decl(float* noalias nocapture readonly %a, float* noalias nocapture %b, i64 %n) {

				; Check that the vector.body still contains a llvm.experimental.noalias.scope.decl

				; CHECK-LABEL: @predicated_noalias_scope_decl(
				; CHECK: vector.body:
				; CHECK: call void @llvm.experimental.noalias.scope.decl
				; CHECK-NOT: @llvm.experimental.noalias.scope.decl
				; CHECK: scalar.ph:
				; CHECK-NOT: @llvm.experimental.noalias.scope.decl
				; CHECK: if.else:
				; CHECK: call void @llvm.experimental.noalias.scope.decl
				; CHECK-NOT: @llvm.experimental.noalias.scope.decl
				; CHECK: }

				entry:
				br label %for.body

				for.body: ; preds = %entry, %if.end5
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %if.end5 ]
				%cmp1 = icmp ult i64 %indvars.iv, 495616
				br i1 %cmp1, label %if.end5, label %if.else

				if.else: ; preds = %for.body
				%cmp2 = icmp ult i64 %indvars.iv, 991232
				tail call void @llvm.experimental.noalias.scope.decl(metadata !0)
				br label %if.end5

				if.end5: ; preds = %for.body, %if.else
				%x.0 = phi float [ 4.200000e+01, %if.else ], [ 2.300000e+01, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %a, i64 %indvars.iv
				%0 = load float, float* %arrayidx, align 4
				%mul = fmul float %x.0, %0
				%arrayidx7 = getelementptr inbounds float, float* %b, i64 %indvars.iv
				store float %mul, float* %arrayidx7, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%cmp = icmp eq i64 %indvars.iv.next, %n
				br i1 %cmp, label %for.cond.cleanup, label %for.body, !llvm.loop !5

				for.cond.cleanup: ; preds = %if.end5
				ret void
				}

				!0 = !{ !1 }
				!1 = distinct !{ !1, !2 }
				!2 = distinct !{ !2 }
				!3 = distinct !{ !3, !2 }
				!4 = !{ !3 }
				!5 = distinct !{!5, !6}
				!6 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}

				; CHECK: [[SCOPE0_LIST]] = !{[[SCOPE0:!.*]]}
				; CHECK: [[SCOPE0]] = distinct !{[[SCOPE0]], [[SCOPE0_DOM:!.*]]}
				; CHECK: [[SCOPE0_DOM]] = distinct !{[[SCOPE0_DOM]]}
				; CHECK: [[SCOPE4_LIST]] = !{[[SCOPE4:!.*]]}
				; CHECK: [[SCOPE4]] = distinct !{[[SCOPE4]], [[SCOPE0_DOM]]}