This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
1/1
LangRef.rst
-
lib/
-
IR/
-
Value.cpp
-
Transforms/InstCombine/
-
InstCombine/
-
InstructionCombining.cpp
-
test/Transforms/
-
Transforms/
-
LICM/
-
hoist-alloc.ll
-
LoopVectorize/X86/
-
X86/
-
load-deref-pred.ll

Differential D100141

[nofree] Restrict semantics to memory visible to caller
ClosedPublic

Authored by reames on Apr 8 2021, 2:34 PM.

Download Raw Diff

Details

Reviewers

bollu
jdoerfert
nlopes
apilipenko
nhaehnle

Commits

rGff55d01a8e1b: [nofree] Restrict semantics to memory visible to caller

Summary

This patch clarifies the semantics of the nofree function attribute to make clear that it provides an "as if" semantic. That is, a nofree function is guaranteed not to free memory which existed before the call, but might allocate and then deallocate that same memory within the lifetime of the callee.

This is the result of the discussion on llvm-dev under the thread "Ambiguity in the nofree function attribute".

The most important part of this change is the LangRef wording. The rest is minor comment changes to emphasize the new semantics where code was accidentally consistent, and fix one place which wasn't consistent. That one place is currently narrowly used as it is primarily part of the ongoing (and not yet enabled) deref-at-point semantics work.

Diff Detail

Event Timeline

reames created this revision.Apr 8 2021, 2:34 PM

Herald added a reviewer: bollu. · View Herald TranscriptApr 8 2021, 2:34 PM

Herald added subscribers: dexonsmith, jdoerfert, asbirlea and 2 others. · View Herald Transcript

reames requested review of this revision.Apr 8 2021, 2:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2021, 2:34 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Realized the alternative was strictly more in sync with existing code, so disregard this.

Harbormaster completed remote builds in B97821: Diff 336233.Apr 8 2021, 3:27 PM

reames updated this revision to Diff 337263.Apr 13 2021, 2:55 PM

reames edited the summary of this revision. (Show Details)

reames added reviewers: jdoerfert, nlopes, apilipenko, nhaehnle.

reames set the repository for this revision to rG LLVM Github Monorepo.

Harbormaster completed remote builds in B98562: Diff 337263.Apr 13 2021, 8:00 PM

LGTM

This revision is now accepted and ready to land.Apr 14 2021, 3:27 AM

Thank you for doing this.

The interaction with multi-threading and capturing makes me mildly nervous. Perhaps I'm just confused, but the second paragraph of the definition there seems to imply that a nofree (but not-nosync) function f is allowed to free any memory that had a pointer to it captured somewhere. But this seems to contradict the first paragraph, which says that f "does not, directly or indirectly, call a memory-deallocation function (free, for example) on a memory allocation which existed before the call."

So which is it?

If f communicates to another thread in a way that causes that thread to free memory, does that count as an indirect call to a memory-deallocation function? If not, why does capturing the pointer make a difference? An argument to f could be temporarily passed to another thread even if it is nocapture...

I have a feeling that this confusion already existed in the previous definition.

Suggestion to make the text clearer.

llvm/docs/LangRef.rst
1580–1594

In D100141#2691664, @nhaehnle wrote:

Thank you for doing this.

The interaction with multi-threading and capturing makes me mildly nervous. Perhaps I'm just confused, but the second paragraph of the definition there seems to imply that a nofree (but not-nosync) function f is allowed to free any memory that had a pointer to it captured somewhere. But this seems to contradict the first paragraph, which says that f "does not, directly or indirectly, call a memory-deallocation function (free, for example) on a memory allocation which existed before the call."

That certainly wasn't the intent. Which bit of wording gives that impression?

(See the bit below which is essentially the inverse of this case, and is intention.)

So which is it?

If f communicates to another thread in a way that causes that thread to free memory, does that count as an indirect call to a memory-deallocation function? If not, why does capturing the pointer make a difference? An argument to f could be temporarily passed to another thread even if it is nocapture...

To the best of my reading of the current code and specification, no having another thread free an object on the behalf of 'f' does not violate a nofree annotation on 'f'. The reasoning here is that a) 'f' is not the one actually freeing, and b) it we picked anything else as a semantic, inferring nofree would require concurrency aware full program analysis.

You can divide the above into two cases:

The object has already been captured before the call.
The object is captured by the call.

Having some other thread free the captured object in case 1 is clearly allowed. Case 2 appears not the have been considered in the current wording from what I can tell, and probably needs further consideration. I do request we separate that into a separate patch though.

I have a feeling that this confusion already existed in the previous definition.

@jdoerfert - Thanks for the wording suggestions; I took all of them.

LGTM, @nhaehnle should comment on the new wording probably.

reames mentioned this in D100551: [deref] No need to check nosync in addition to nofree.Apr 16 2021, 11:26 AM

In D100141#2695406, @jdoerfert wrote:

LGTM, @nhaehnle should comment on the new wording probably.

JFYI, I don't intend to hold for @nhaehnle as the concern he raised appears orthogonal to the change being made in this review. I am happy to continue this discussion and post another patch if we thing further clarification is warranted for the concurrency case. I am leaning in that direction myself.

Doing a final rebuild now, and will submit after that.

This revision was landed with ongoing or failed builds.Apr 16 2021, 11:39 AM

Closed by commit rGff55d01a8e1b: [nofree] Restrict semantics to memory visible to caller (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rGff55d01a8e1b: [nofree] Restrict semantics to memory visible to caller.

Posted https://reviews.llvm.org/D100676 with an attempt to clarifying the nocapture case raised in my response to @nhaehnle above.

I've updated Alive2 with the new semantics and I see one regression:
llvm/test/Transforms/InstCombine/malloc-free-delete.ll

define void @test14(* %foo) nofree {
  free * %foo
  ret void
}
=>
define void @test14(* %foo) nofree {
  call void #trap() nowrite noreturn
  assume i1 0
}

Transformation doesn't verify!

ERROR: Source is more defined than target

Example:
* %foo = null

free(null) is a no-op, so I think the test case is buggy. This transformation can only be done if the argument is non-null.

In D100141#2697374, @nlopes wrote:
I've updated Alive2 with the new semantics and I see one regression:
llvm/test/Transforms/InstCombine/malloc-free-delete.ll
define void @test14(* %foo) nofree {
  free * %foo
  ret void
}
=>
define void @test14(* %foo) nofree {
  call void #trap() nowrite noreturn
  assume i1 0
}

Transformation doesn't verify!

ERROR: Source is more defined than target

Example:
* %foo = null
free(null) is a no-op, so I think the test case is buggy. This transformation can only be done if the argument is non-null.

Right, the transformation should be llvm.assume(%foo == null).

In D100141#2697374, @nlopes wrote:

I've updated Alive2 with the new semantics and I see one regression:

Ok, wow. Thank you!

This patch and the alive2 tooling combined just paid off big time.

In the background, I've been trying to figure out a miscompile (https://github.com/emscripten-core/emscripten/issues/9443), and I'm pretty sure this exactly the issue. Or at least, it seems pretty likely.

I will be posting a fix for the issue identified here later today.

Thanks again!

dexonsmith removed a subscriber: dexonsmith.Apr 19 2021, 11:14 AM

reames mentioned this in D100779: free(nullptr) does not violate the nofree specification.Apr 19 2021, 11:24 AM

Proposed fix for the issue Nuno found here: https://reviews.llvm.org/D100779

reames mentioned this in rG3b1474cab26b: free(nullptr) does not violate the nofree specification.Apr 20 2021, 9:08 AM

nhaehnle mentioned this in D101701: [nofree] Refine concurrency requirements.May 1 2021, 2:47 PM

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

19 lines

lib/

IR/

Value.cpp

17 lines

Transforms/

InstCombine/

InstructionCombining.cpp

6 lines

test/

Transforms/

LICM/

hoist-alloc.ll

4 lines

LoopVectorize/

X86/

load-deref-pred.ll

32 lines

Diff 336233

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,571 Lines • ▼ Show 20 Lines

``noduplicate``

its parent function.

A function containing a ``noduplicate`` call may still

be an inlining candidate, provided that the call is not

duplicated by inlining. That implies that the function has

internal linkage and only has one call site, so the original

call is dead after inlining.

``nofree``

This function attribute indicates that the function does not, directly or

indirectly, call a memory-deallocation function (free, for example). As a

indirectly, call a memory-deallocation function (free, for example) on

result, uncaptured pointers that are known to be dereferenceable prior to a

a memory allocation which existed before the call.

call to a function with the ``nofree`` attribute are still known to be

dereferenceable after the call (the capturing condition is necessary in

As a result, uncaptured pointers that are known to be dereferenceable

environments where the function might communicate the pointer to another thread

prior to a call to a function with the ``nofree`` attribute are still

which then deallocates the memory).

known to be dereferenceable after the call (the capturing condition is

necessary in environments where the function might communicate the

pointer to another thread which then deallocates the memory).

A ``nofree`` function is explicltly allowed to free memory which it

allocated or (if not ``nosync``) arrange for another thread to free

said memory on it's behalf. As a result, perhaphs suprisingly, a

``nofree`` can return a pointer to a previously deallocated memory object.

``noimplicitfloat``

jdoerfertUnsubmitted

Done

``nofree``

This function attribute indicates that the function does not, directly or

- indirectly, call a memory-deallocation function (free, for example) on

- a memory allocation which existed before the call.

+ transitively, call a memory-deallocation function (``free``, for example)

+ on a memory allocation which existed before the call.

As a result, uncaptured pointers that are known to be dereferenceable

prior to a call to a function with the ``nofree`` attribute are still

- known to be dereferenceable after the call (the capturing condition is

+ known to be dereferenceable after the call. The capturing condition is

necessary in environments where the function might communicate the

- pointer to another thread which then deallocates the memory).

+ pointer to another thread which then deallocates the memory. Alternatively,

+ ``nosync`` would ensure such communication cannot happen and even captured

+ pointers cannot be freed by the function.

A ``nofree`` function is explicitly allowed to free memory which it

allocated or (if not ``nosync``) arrange for another thread to free

- said memory on it's behalf. As a result, perhaps surprisingly, a

+ any memory on it's behalf. As a result, perhaps surprisingly, a

``nofree`` function can return a pointer to a previously deallocated

memory object.

``noimplicitfloat``

jdoerfert:

This attributes disables implicit floating-point instructions.

``noinline``

This attribute indicates that the inliner should never inline this

function in any situation. This attribute may not be used together

with the ``alwaysinline`` attribute.

``nomerge``

This attribute indicates that calls to this function should never be merged

during optimization. For example, it will prevent tail merging otherwise

▲ Show 20 Lines • Show All 20,306 Lines • Show Last 20 Lines

llvm/lib/IR/Value.cpp

Show First 20 Lines • Show All 730 Lines • ▼ Show 20 Lines	bool Value::canBeFreed() const {

// Cases that can simply never be deallocated		// Cases that can simply never be deallocated
// *) Constants aren't allocated per se, thus not deallocated either.		// *) Constants aren't allocated per se, thus not deallocated either.
if (isa<Constant>(this))		if (isa<Constant>(this))
return false;		return false;

// Handle byval/byref/sret/inalloca/preallocated arguments. The storage		// Handle byval/byref/sret/inalloca/preallocated arguments. The storage
// lifetime is guaranteed to be longer than the callee's lifetime.		// lifetime is guaranteed to be longer than the callee's lifetime.
if (auto *A = dyn_cast<Argument>(this))		if (auto *A = dyn_cast<Argument>(this)) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'auto A' can be declared as 'const auto A' [llvm-qualified-auto] not useful Lint: Pre-merge checks: clang-tidy: warning: 'auto A' can be declared as 'const auto A' [llvm-qualified-auto] [[https…
if (A->hasPointeeInMemoryValueAttr())		if (A->hasPointeeInMemoryValueAttr())
return false;		return false;
		// A pointer to an object in a function which neither frees, nor can arrange
		// for another thread to free on its behalf, can not be freed in the scope
		// of the function. Note that this logic is restricted to memory
		// allocations in existance before the call; a nofree function is allowed
		// to free memory it allocated.
		const Function *F = A->getParent();
		if (F->doesNotFreeMemory() && F->hasNoSync())
		return false;
		}

const Function *F = nullptr;		const Function *F = nullptr;
if (auto *I = dyn_cast<Instruction>(this))		if (auto *I = dyn_cast<Instruction>(this))
F = I->getFunction();		F = I->getFunction();
if (auto *A = dyn_cast<Argument>(this))		if (auto *A = dyn_cast<Argument>(this))
F = A->getParent();		F = A->getParent();

if (!F)		if (!F)
return true;		return true;

// A pointer to an object in a function which neither frees, nor can arrange
// for another thread to free on its behalf, can not be freed in the scope
// of the function.
if (F->doesNotFreeMemory() && F->hasNoSync())
return false;

// With garbage collection, deallocation typically occurs solely at or after		// With garbage collection, deallocation typically occurs solely at or after
// safepoints. If we're compiling for a collector which uses the		// safepoints. If we're compiling for a collector which uses the
// gc.statepoint infrastructure, safepoints aren't explicitly present		// gc.statepoint infrastructure, safepoints aren't explicitly present
// in the IR until after lowering from abstract to physical machine model.		// in the IR until after lowering from abstract to physical machine model.
// The collector could chose to mix explicit deallocation and gc'd objects		// The collector could chose to mix explicit deallocation and gc'd objects
// which is why we need the explicit opt in on a per collector basis.		// which is why we need the explicit opt in on a per collector basis.
if (!F->hasGC())		if (!F->hasGC())
return true;		return true;
▲ Show 20 Lines • Show All 409 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 2,796 Lines • ▼ Show 20 Lines	Instruction *InstCombinerImpl::visitFree(CallInst &FI) {
// If we have 'free null' delete the instruction. This can happen in stl code		// If we have 'free null' delete the instruction. This can happen in stl code
// when lots of inlining happens.		// when lots of inlining happens.
if (isa<ConstantPointerNull>(Op))		if (isa<ConstantPointerNull>(Op))
return eraseInstFromFunction(FI);		return eraseInstFromFunction(FI);

// If we free a pointer we've been explicitly told won't be freed, this		// If we free a pointer we've been explicitly told won't be freed, this
// would be full UB and thus we can conclude this is unreachable. Cases:		// would be full UB and thus we can conclude this is unreachable. Cases:
// 1) freeing a pointer which is explicitly nofree		// 1) freeing a pointer which is explicitly nofree
// 2) calling free from a call site marked nofree		// 2) calling free from a call site marked nofree (TODO: can generalize
// 3) calling free in a function scope marked nofree		// for non-arguments)
		// 3) calling free in a function scope marked nofree (when we can prove
		// the allocation existed before the start of the function scope)
if (auto *A = dyn_cast<Argument>(Op->stripPointerCasts()))		if (auto *A = dyn_cast<Argument>(Op->stripPointerCasts()))
if (A->hasAttribute(Attribute::NoFree) \|\|		if (A->hasAttribute(Attribute::NoFree) \|\|
FI.hasFnAttr(Attribute::NoFree) \|\|		FI.hasFnAttr(Attribute::NoFree) \|\|
FI.getFunction()->hasFnAttribute(Attribute::NoFree)) {		FI.getFunction()->hasFnAttribute(Attribute::NoFree)) {
// Leave a marker since we can't modify the CFG here.		// Leave a marker since we can't modify the CFG here.
CreateNonTerminatorUnreachable(&FI);		CreateNonTerminatorUnreachable(&FI);
return eraseInstFromFunction(FI);		return eraseInstFromFunction(FI);
}		}
▲ Show 20 Lines • Show All 1,341 Lines • Show Last 20 Lines

llvm/test/Transforms/LICM/hoist-alloc.ll

	Show First 20 Lines • Show All 172 Lines • ▼ Show 20 Lines
	}			}

	define i8 @test_hoist_malloc_leak() nofree nosync {			define i8 @test_hoist_malloc_leak() nofree nosync {
	; CHECK-LABEL: @test_hoist_malloc_leak(			; CHECK-LABEL: @test_hoist_malloc_leak(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[A_RAW:%.]] = call nonnull i8 @malloc(i64 32)			; CHECK-NEXT: [[A_RAW:%.]] = call nonnull i8 @malloc(i64 32)
	; CHECK-NEXT: call void @init(i8* [[A_RAW]])			; CHECK-NEXT: call void @init(i8* [[A_RAW]])
	; CHECK-NEXT: [[ADDR:%.]] = getelementptr i8, i8 [[A_RAW]], i32 31			; CHECK-NEXT: [[ADDR:%.]] = getelementptr i8, i8 [[A_RAW]], i32 31
	; CHECK-NEXT: [[RES:%.]] = load i8, i8 [[ADDR]], align 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; CHECK-NEXT: call void @unknown()			; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[RES:%.]] = load i8, i8 [[ADDR]], align 1
	; CHECK-NEXT: call void @use(i8 [[RES]])			; CHECK-NEXT: call void @use(i8 [[RES]])
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], 200			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], 200
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[RES_LCSSA:%.*]] = phi i8 [ [[RES]], [[FOR_BODY]] ]			; CHECK-NEXT: [[RES_LCSSA:%.*]] = phi i8 [ [[RES]], [[FOR_BODY]] ]
	; CHECK-NEXT: ret i8 [[RES_LCSSA]]			; CHECK-NEXT: ret i8 [[RES_LCSSA]]
	;			;
	▲ Show 20 Lines • Show All 173 Lines • ▼ Show 20 Lines
	}			}

	define i8 @test_hoist_allocsize_leak() nofree nosync {			define i8 @test_hoist_allocsize_leak() nofree nosync {
	; CHECK-LABEL: @test_hoist_allocsize_leak(			; CHECK-LABEL: @test_hoist_allocsize_leak(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[A_RAW:%.]] = call nonnull i8 @my_alloc(i64 32)			; CHECK-NEXT: [[A_RAW:%.]] = call nonnull i8 @my_alloc(i64 32)
	; CHECK-NEXT: call void @init(i8* [[A_RAW]])			; CHECK-NEXT: call void @init(i8* [[A_RAW]])
	; CHECK-NEXT: [[ADDR:%.]] = getelementptr i8, i8 [[A_RAW]], i32 31			; CHECK-NEXT: [[ADDR:%.]] = getelementptr i8, i8 [[A_RAW]], i32 31
	; CHECK-NEXT: [[RES:%.]] = load i8, i8 [[ADDR]], align 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[IV_NEXT:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; CHECK-NEXT: call void @unknown()			; CHECK-NEXT: call void @unknown()
				; CHECK-NEXT: [[RES:%.]] = load i8, i8 [[ADDR]], align 1
	; CHECK-NEXT: call void @use(i8 [[RES]])			; CHECK-NEXT: call void @use(i8 [[RES]])
	; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], 200			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[IV_NEXT]], 200
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[RES_LCSSA:%.*]] = phi i8 [ [[RES]], [[FOR_BODY]] ]			; CHECK-NEXT: [[RES_LCSSA:%.*]] = phi i8 [ [[RES]], [[FOR_BODY]] ]
	; CHECK-NEXT: ret i8 [[RES_LCSSA]]			; CHECK-NEXT: ret i8 [[RES_LCSSA]]
	;			;
	Show All 18 Lines

llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll

	Show First 20 Lines • Show All 2,293 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2			; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2
	; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3			; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3
	; CHECK-NEXT: [[TMP64:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP64:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP65:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP65:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP67:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP12]]			; CHECK-NEXT: [[TMP67:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP68:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 0			; CHECK-NEXT: [[TMP68:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 0
	; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*			; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP69]], align 4			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP69]], i32 4, <4 x i1> [[TMP39]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP70:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 4			; CHECK-NEXT: [[TMP70:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 4
	; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*			; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD4:%.]] = load <4 x i32>, <4 x i32> [[TMP71]], align 4			; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP71]], i32 4, <4 x i1> [[TMP47]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP72:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 8			; CHECK-NEXT: [[TMP72:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 8
	; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*			; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD5:%.]] = load <4 x i32>, <4 x i32> [[TMP73]], align 4			; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP73]], i32 4, <4 x i1> [[TMP55]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP74:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 12			; CHECK-NEXT: [[TMP74:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 12
	; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*			; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD6:%.]] = load <4 x i32>, <4 x i32> [[TMP75]], align 4			; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP75]], i32 4, <4 x i1> [[TMP63]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_LOAD]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_LOAD4]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI8:%.*]] = select <4 x i1> [[TMP55]], <4 x i32> [[WIDE_LOAD5]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI8:%.*]] = select <4 x i1> [[TMP55]], <4 x i32> [[WIDE_MASKED_LOAD5]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI9:%.*]] = select <4 x i1> [[TMP63]], <4 x i32> [[WIDE_LOAD6]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI9:%.*]] = select <4 x i1> [[TMP63]], <4 x i32> [[WIDE_MASKED_LOAD6]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP80]] = add <4 x i32> [[VEC_PHI]], [[PREDPHI]]			; CHECK-NEXT: [[TMP80]] = add <4 x i32> [[VEC_PHI]], [[PREDPHI]]
	; CHECK-NEXT: [[TMP81]] = add <4 x i32> [[VEC_PHI1]], [[PREDPHI7]]			; CHECK-NEXT: [[TMP81]] = add <4 x i32> [[VEC_PHI1]], [[PREDPHI7]]
	; CHECK-NEXT: [[TMP82]] = add <4 x i32> [[VEC_PHI2]], [[PREDPHI8]]			; CHECK-NEXT: [[TMP82]] = add <4 x i32> [[VEC_PHI2]], [[PREDPHI8]]
	; CHECK-NEXT: [[TMP83]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI9]]			; CHECK-NEXT: [[TMP83]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI9]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP84]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP24:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP84]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP24:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2			; CHECK-NEXT: [[TMP62:%.*]] = insertelement <4 x i1> [[TMP61]], i1 [[TMP58]], i32 2
	; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3			; CHECK-NEXT: [[TMP63:%.*]] = insertelement <4 x i1> [[TMP62]], i1 [[TMP59]], i32 3
	; CHECK-NEXT: [[TMP64:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP0]]			; CHECK-NEXT: [[TMP64:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP65:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP4]]			; CHECK-NEXT: [[TMP65:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP8]]			; CHECK-NEXT: [[TMP66:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP8]]
	; CHECK-NEXT: [[TMP67:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP12]]			; CHECK-NEXT: [[TMP67:%.]] = getelementptr inbounds i32, i32 [[BASE]], i64 [[TMP12]]
	; CHECK-NEXT: [[TMP68:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 0			; CHECK-NEXT: [[TMP68:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 0
	; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*			; CHECK-NEXT: [[TMP69:%.]] = bitcast i32 [[TMP68]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD:%.]] = load <4 x i32>, <4 x i32> [[TMP69]], align 4			; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP69]], i32 4, <4 x i1> [[TMP39]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP70:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 4			; CHECK-NEXT: [[TMP70:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 4
	; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*			; CHECK-NEXT: [[TMP71:%.]] = bitcast i32 [[TMP70]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD4:%.]] = load <4 x i32>, <4 x i32> [[TMP71]], align 4			; CHECK-NEXT: [[WIDE_MASKED_LOAD4:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP71]], i32 4, <4 x i1> [[TMP47]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP72:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 8			; CHECK-NEXT: [[TMP72:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 8
	; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*			; CHECK-NEXT: [[TMP73:%.]] = bitcast i32 [[TMP72]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD5:%.]] = load <4 x i32>, <4 x i32> [[TMP73]], align 4			; CHECK-NEXT: [[WIDE_MASKED_LOAD5:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP73]], i32 4, <4 x i1> [[TMP55]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP74:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 12			; CHECK-NEXT: [[TMP74:%.]] = getelementptr inbounds i32, i32 [[TMP64]], i32 12
	; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*			; CHECK-NEXT: [[TMP75:%.]] = bitcast i32 [[TMP74]] to <4 x i32>*
	; CHECK-NEXT: [[WIDE_LOAD6:%.]] = load <4 x i32>, <4 x i32> [[TMP75]], align 4			; CHECK-NEXT: [[WIDE_MASKED_LOAD6:%.]] = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32> [[TMP75]], i32 4, <4 x i1> [[TMP63]], <4 x i32> poison)
	; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP76:%.*]] = xor <4 x i1> [[TMP39]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP77:%.*]] = xor <4 x i1> [[TMP47]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP78:%.*]] = xor <4 x i1> [[TMP55]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>			; CHECK-NEXT: [[TMP79:%.*]] = xor <4 x i1> [[TMP63]], <i1 true, i1 true, i1 true, i1 true>
	; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_LOAD]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI:%.*]] = select <4 x i1> [[TMP39]], <4 x i32> [[WIDE_MASKED_LOAD]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_LOAD4]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI7:%.*]] = select <4 x i1> [[TMP47]], <4 x i32> [[WIDE_MASKED_LOAD4]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI8:%.*]] = select <4 x i1> [[TMP55]], <4 x i32> [[WIDE_LOAD5]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI8:%.*]] = select <4 x i1> [[TMP55]], <4 x i32> [[WIDE_MASKED_LOAD5]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[PREDPHI9:%.*]] = select <4 x i1> [[TMP63]], <4 x i32> [[WIDE_LOAD6]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[PREDPHI9:%.*]] = select <4 x i1> [[TMP63]], <4 x i32> [[WIDE_MASKED_LOAD6]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP80]] = add <4 x i32> [[VEC_PHI]], [[PREDPHI]]			; CHECK-NEXT: [[TMP80]] = add <4 x i32> [[VEC_PHI]], [[PREDPHI]]
	; CHECK-NEXT: [[TMP81]] = add <4 x i32> [[VEC_PHI1]], [[PREDPHI7]]			; CHECK-NEXT: [[TMP81]] = add <4 x i32> [[VEC_PHI1]], [[PREDPHI7]]
	; CHECK-NEXT: [[TMP82]] = add <4 x i32> [[VEC_PHI2]], [[PREDPHI8]]			; CHECK-NEXT: [[TMP82]] = add <4 x i32> [[VEC_PHI2]], [[PREDPHI8]]
	; CHECK-NEXT: [[TMP83]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI9]]			; CHECK-NEXT: [[TMP83]] = add <4 x i32> [[VEC_PHI3]], [[PREDPHI9]]
	; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16			; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 16
	; CHECK-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096			; CHECK-NEXT: [[TMP84:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
	; CHECK-NEXT: br i1 [[TMP84]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]]			; CHECK-NEXT: br i1 [[TMP84]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP26:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	▲ Show 20 Lines • Show All 238 Lines • Show Last 20 Lines