This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
3/16
LoopVectorize.cpp
-
test/CodeGen/ARM/
-
CodeGen/
-
ARM/
1
loopvectorize_pr33804_1.ll
1
loopvectorize_pr33804_2.ll

Differential D35498

[LoopVectorizer] Use two step casting for float to pointer types.
ClosedPublic

Authored by manojgupta on Jul 17 2017, 12:14 PM.

Download Raw Diff

Details

Reviewers

mkuper
Ayal
dlj
rengolin
srhines

Commits

rG6b54c7e11beb: [LoopVectorizer] Use two step casting for float to pointer types.
rL312331: [LoopVectorizer] Use two step casting for float to pointer types.

Summary

LoopVectorizer is creating casts between vec<ptr> and vec<float> types
on ARM when compiling OpenCV. Since, tIs is illegal to directly cast a
floating point type to a pointer type even if the types have same size
causing a crash. Fix the crash using a two-step casting by bitcasting
to integer and integer to pointer/float.
Fixes PR33804.

Diff Detail

Build Status

Buildable 9461
Build 9461: arc lint + arc unit

Event Timeline

manojgupta created this revision.Jul 17 2017, 12:14 PM

Herald added subscribers: mzolotukhin, rengolin. · View Herald TranscriptJul 17 2017, 12:14 PM

manojgupta added inline comments.Jul 17 2017, 12:25 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
2985	Loop vectorize crashes when trying to do the following cast : <4 x float> <float 0xC415AF1D80000000, float 0xC415AF1D80000000, float 0xC415AF1D80000000, float 0xC415AF1D80000000> to <4 x %struct.CvNode1D* > As a floating point type cannot be directly casted to a pointer type (even if bitwidth is same), the crash can be avoided using two bitcasts (float->int and int-> pointer).

Setting aside from the fact that no one should ever be casting floating point to pointers, perhaps we shouldn't be even trying to do this transformation here, as I'm not sure this could have a guaranteed semantics in this case, and probably better to just bail the vectorisation?

Also, tests, etc. please.

In D35498#811759, @rengolin wrote:

Setting aside from the fact that no one should ever be casting floating point to pointers, perhaps we shouldn't be even trying to do this transformation here, as I'm not sure this could have a guaranteed semantics in this case, and probably better to just bail the vectorisation?

Also, tests, etc. please.

I agree that this transformation is unsafe and bailing out is the preferred way. Only if I knew where to add that check.

In D35498#811759, @rengolin wrote:

Setting aside from the fact that no one should ever be casting floating point to pointers, perhaps we shouldn't be even trying to do this transformation here, as I'm not sure this could have a guaranteed semantics in this case, and probably better to just bail the vectorisation?

I'm not sure about this. What's happening here, if I understand correctly, is that we have a struct:

typedef struct CvNode1D
{
    float val;
    struct CvNode1D *next;
}
CvNode1D;

And we're trying to vectorize code that loads these structs. Since the pointer and the float have the same width, we can load four of the structs as, e.g. 2 * <4 x float>, and then use shuffles to get a vector of 4 floats and a vectors of 4 pointers.

Anyway, I agree this isn't the right fix, but my gut feeling is that the right fix is to actually allow the builder to create a bitcast between a pointer and a pointer-sized float.
I don't see anything in the langref that makes the semantics of this undefined. Renato, what kind of semantic issues do you see here?

Also, tests, etc. please.

In D35498#811867, @mkuper wrote:
In D35498#811759, @rengolin wrote:

Setting aside from the fact that no one should ever be casting floating point to pointers, perhaps we shouldn't be even trying to do this transformation here, as I'm not sure this could have a guaranteed semantics in this case, and probably better to just bail the vectorisation?

I'm not sure about this. What's happening here, if I understand correctly, is that we have a struct:
typedef struct CvNode1D
{
    float val;
    struct CvNode1D *next;
}
CvNode1D;
And we're trying to vectorize code that loads these structs. Since the pointer and the float have the same width, we can load four of the structs as, e.g. 2 * <4 x float>, and then use shuffles to get a vector of 4 floats and a vectors of 4 pointers.

Anyway, I agree this isn't the right fix, but my gut feeling is that the right fix is to actually allow the builder to create a bitcast between a pointer and a pointer-sized float.

Should I update the Builder's CreateBitOrPointerCast function to handle float to pointer casts using float -> int and int -> pointer casts? I can do that or I can create a local pass specific casting function to handle float -> ptr + other CreateBitOrPointerCast routines used here.

I don't see anything in the langref that makes the semantics of this undefined. Renato, what kind of semantic issues do you see here?

Also, tests, etc. please.

+1

I am trying to create a reduced test case but no success so far.

Should I update the Builder's CreateBitOrPointerCast function to handle float to pointer casts using float -> int and int -> pointer casts? I can do that or I can create a local pass specific casting function to handle float -> ptr + other CreateBitOrPointerCast routines used here.

Why can't you do the bitcast directly inside CreateBitOrPointerCast()? You'll also want to upadte isBitOrNoopPointerCastable(),
Anyway, you probably want a different set of reviewers for that patch - I'm really not the authority on that, and at least Renato seems to have a completely different opinion.

I am trying to create a reduced test case but no success so far.

The standard way to do this is to dump the IR just before the vectorizer, and then use bugpoint to reduce.
Have you tried that?

In D35498#812053, @mkuper wrote:

I am trying to create a reduced test case but no success so far.

The standard way to do this is to dump the IR just before the vectorizer, and then use bugpoint to reduce.
Have you tried that?

I did that already here: https://llvm.org/PR33804#c5

In D35498#811867, @mkuper wrote:
typedef struct CvNode1D
{
    float val;
    struct CvNode1D *next;
}
CvNode1D;
And we're trying to vectorize code that loads these structs. Since the pointer and the float have the same width, we can load four of the structs as, e.g. 2 * <4 x float>, and then use shuffles to get a vector of 4 floats and a vectors of 4 pointers.

Right, sorry, I missed the bug reference.

Anyway, I agree this isn't the right fix, but my gut feeling is that the right fix is to actually allow the builder to create a bitcast between a pointer and a pointer-sized float.
I don't see anything in the langref that makes the semantics of this undefined. Renato, what kind of semantic issues do you see here?

I agree the vectorizer can "safely" assume this case is ok because we know the original datatype was a pointer anyway and this is part of a strided load, but I feel we'd be opening a can of worms if we start allowing any float<->pointer conversion by default.

For example, a different case would be pointer->float->double->pointer. C has automatic promotions, and some corner cases may slip and create that sequence, which would destroy the bit-pattern and therefore the memory address.

So, if we can do this in this specific case, Manoj's current fix is "better" than moving it up CreateBitOrPointerCast, because we know what the semantics is. Or maybe we just load them as "data" (i32/i64?) and then bitcast safely?

Makes sense?

Anyway, I agree this isn't the right fix, but my gut feeling is that the right fix is to actually allow the builder to create a bitcast between a pointer and a pointer-sized float.
I don't see anything in the langref that makes the semantics of this undefined. Renato, what kind of semantic issues do you see here?

I agree the vectorizer can "safely" assume this case is ok because we know the original datatype was a pointer anyway and this is part of a strided load, but I feel we'd be opening a can of worms if we start allowing any float<->pointer conversion by default.

For example, a different case would be pointer->float->double->pointer. C has automatic promotions, and some corner cases may slip and create that sequence, which would destroy the bit-pattern and therefore the memory address.

So, if we can do this in this specific case, Manoj's current fix is "better" than moving it up CreateBitOrPointerCast, because we know what the semantics is. Or maybe we just load them as "data" (i32/i64?) and then bitcast safely?

Makes sense?

First of all, sorry, you're right, I misread the langref - the cast really is illegal in IR, it's not a builder issue.
I really don't think it should be illegal - but a patch trying to fix a crash is probably not the right place to shave that yak,

I guess this is fine as a stop-gap. I think we need the other direction too, though. That is, should cover both the pointer -> float and float -> pointer cases. (And have tests for both.)

In D35498#813120, @rengolin wrote:

... Or maybe we just load them as "data" (i32/i64?) and then bitcast safely?

Makes sense?

It does to me. The wide load and wide store are simply trying to move the packed bits together; each separate shuffle has its specific (possibly distinct) type.

(A follow-up issue may rise when attempting to interleave loads/stores of different sizes together; we're not there yet.)

A few additional comments below, mostly for completeness. They can be addressed by follow-up patches, if fixing the PR case (only) first is preferred.

lib/Transforms/Vectorize/LoopVectorize.cpp
2948	Same floating-point-vs-pointer type casting issue may hold for interleaved loads here as well, right?
2977–2978	While you're at it, please correct this typo: "... cast it to a[n] unified type". Can continue and comment here what this unified type may be.
2983	Or the other way around, when the last appearing store marking the insertion position of the final wide store has SubVT type of floating point, and another member has StoredVec type of a pointer. E.g., when the fields are swapped within the struct.
2994	At this stage assert that this 'else' is not reached, rather than return silently w/o generating the final wide store. Suggest to add a condition to Legal checking that types are compatible when forming interleave-groups.

Some cleanup, moved the casting to its own function
and added a test case.

Harbormaster completed remote builds in B8370: Diff 107194.Jul 18 2017, 3:46 PM

In D35498#813096, @Meinersbur wrote:

In D35498#812053, @mkuper wrote:

I am trying to create a reduced test case but no success so far.

The standard way to do this is to dump the IR just before the vectorizer, and then use bugpoint to reduce.
Have you tried that?

I did that already here: https://llvm.org/PR33804#c5

Thanks,
Added the test case to the review.

manojgupta marked 3 inline comments as done.Jul 18 2017, 3:49 PM

manojgupta added inline comments.

lib/Transforms/Vectorize/LoopVectorize.cpp
2994	Added assert in the createBitCast function.

Ayal added inline comments.Jul 18 2017, 11:36 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
2948	Thanks. This interleaved loads case requires a test.
2983	This swapped case requires a test.
3337	This function assumes V has vector type, having same number of elements as VTy, and both have elements of same size in bits. These properties can be asserted upfront. Suffice to check once if direct cast works and exit early if so. If not, assert one is a pointer and the other a floating point. Suffice to have only one copy of the code creating the two casts. Explain (in method declaration above) that we're bit casting from V to VTy, and/or use more informative variable names.

mkazantsev added a subscriber: mkazantsev.Jul 19 2017, 12:56 AM

mkazantsev added inline comments.

lib/Transforms/Vectorize/LoopVectorize.cpp
3355	You can sink these 3 lines and last 3 lines of "else" branch to after the "if" (if IntTy is decpared above the if).

rengolin added inline comments.Jul 19 2017, 3:48 AM

test/Transforms/LoopVectorize/pr33804.ll
10 ↗	(On Diff #107194)	The ARM backend is not always compiled in all bots, this will break them. Please remove the triple. Keeping the datalayout may be enough, if not, move them to target-specific directories and make sure you have at least on two very different targets (like ARM32 and x86_64).
86 ↗	(On Diff #107194)	do you really need all those attributes? Try to remove them and see if they change the results. If not, just clean them up.
96 ↗	(On Diff #107194)	You probably don't need any of those either.

Meinersbur added inline comments.Jul 19 2017, 8:46 AM

test/Transforms/LoopVectorize/pr33804.ll
1 ↗	(On Diff #107194)	It looks like you are only CHECKing the IR output (`-S`). If this is the case you can remove `-debug 2>&1` and `REQUIRES: asserts` Mixing stderr and stdout is problematic because how they are interleaved is undefined.

Meinersbur added inline comments.Jul 19 2017, 9:58 AM

test/Transforms/LoopVectorize/pr33804.ll
19–58 ↗	(On Diff #107194)	I assume none of these are needed and bugpoint was just unable to remove invoke calls. You could try to remove them manually.

Back to it after a (long) break.

Added tests for both pointer->float and float->pointer cases.
CLeaned up most conditions into asserts.
Moved tests to Codegen/ARM. The crash disappears without arm tuple so can't remove it.
Simplified/cleanedup the tests.

Harbormaster completed remote builds in B9461: Diff 111914.Aug 20 2017, 4:59 PM

Herald added a subscriber: javed.absar. · View Herald TranscriptAug 20 2017, 4:59 PM

Removed -debug and stderr usage from the test cases.

Harbormaster completed remote builds in B9464: Diff 111920.Aug 20 2017, 6:26 PM

rengolin added inline comments.Aug 21 2017, 8:37 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
590	NIT: I think you should keep the `createBitOrPointerCast` name, to make it clear.
3298	Could you just keep this version? Most of the code above will be nops anyway, no need to fast-track identical calls.
test/CodeGen/ARM/loopvectorize_pr33804_1.ll
19	Do you really need all the exception handling here? I imagine that the loop below would do just fine.
test/CodeGen/ARM/loopvectorize_pr33804_2.ll
19	Same here. Also, you could merge the two function into one single IR file.

Addressed the following comments:

Simplified the test cases and merged into a single file.
Renamed the function.
Kept a single call to direct bitcast.

Harbormaster completed remote builds in B9645: Diff 112740.Aug 25 2017, 1:47 PM

Updated summary.

Herald added subscribers: kristof.beyls, aemerson. · View Herald TranscriptAug 25 2017, 3:08 PM

Harbormaster completed remote builds in B9650: Diff 112752.Aug 25 2017, 3:10 PM

Friendly ping for review.

Moved tests to Codegen/ARM. The crash disappears without arm tuple so can't remove it

Is that because the stores don't get interleaved?

lib/Transforms/Vectorize/LoopVectorize.cpp
3287	Remove the first assert by turning dyn_cast<> into cast<>. Add a message to the second assert.
3290	Add a message to the assert.
3301	ditto
3303	check clang-format indentation.
test/CodeGen/ARM/loopvectorize_pr33804.ll
17 ↗	(On Diff #112752)	Best force vectorization width to 8 if that's the VF we expect; or expect a vector store of any width.
38 ↗	(On Diff #112752)	Explain how the second test differs from the first; i.e., the first requires casting the float value to be stored, into a pointer, and the second requires casting the pointer value into a float.
63 ↗	(On Diff #112752)	What about tests for float<->pointer conversions on interleaved groups of loads?

Sorry for the delay. It looks good to me now, thanks!

This revision is now accepted and ready to land.Aug 31 2017, 2:07 AM

Sorry, slight overlap. Please follow Ayal's comments before committing.

In D35498#857495, @rengolin wrote:

Sorry, slight overlap. Please follow Ayal's comments before committing.

My last comments are minor; I'm also ok with the patch after they are addressed.
Another final comment is to also have a test that compiles to 64 bits having a struct with double and pointer fields.

Added messages to asserts.
Fixed some indentation issues.
Added test case for double <-> pointer for AArch64. I could not reproduce this on x86_64.
Added more comments in unit tests to make it more clear.

Harbormaster completed remote builds in B9811: Diff 113454.Aug 31 2017, 12:35 PM

manojgupta marked 2 inline comments as done.Aug 31 2017, 12:42 PM

manojgupta added inline comments.

test/CodeGen/ARM/loopvectorize_pr33804.ll
17 ↗	(On Diff #112752)	Changed it to expect a vector store of any width since the concern is vectorizer should not crash, not the vectorized size.
63 ↗	(On Diff #112752)	Because of my limited knowledge of how loop vectorizer works, I am not yet able to create a test case for loads which triggers float <-> pointer casting. Will try to create a test case for loads in a follow up commit. I did add another test case for double <-> pointer for AArch64.

Ayal added inline comments.Sep 1 2017, 2:58 AM

test/CodeGen/ARM/loopvectorize_pr33804.ll
63 ↗	(On Diff #112752)	You can simply first load the two fields, say fadd 0.5 to one and do pointer++ using gep to the other, then store them back; or even just copy them from one array to another.

manojgupta marked an inline comment as done.Sep 1 2017, 6:44 AM

manojgupta added inline comments.

test/CodeGen/ARM/loopvectorize_pr33804.ll
63 ↗	(On Diff #112752)	Tried that but still no luck. Storing back to same addresses or even to some globals inside/outside the loop didn't help. Is to ok to check this in first and try adding a test case for load in a follow up commit?

manojgupta closed this revision.Sep 1 2017, 8:37 AM

This was closed due to committing r312331, right? Code LGTM, for the record. Tests for interleaved loads of float/pointer should still be added, as this patch presumably handles them too.

I was able to generate the test cases for load ( https://reviews.llvm.org/D37967)

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

46 lines

test/

CodeGen/

ARM/

loopvectorize_pr33804_1.ll

90 lines

loopvectorize_pr33804_2.ll

90 lines

Diff 111914

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 579 Lines • ▼ Show 20 Lines	protected:
virtual Value reverseVector(Value Vec);		virtual Value reverseVector(Value Vec);

/// Returns (and creates if needed) the original loop trip count.		/// Returns (and creates if needed) the original loop trip count.
Value getOrCreateTripCount(Loop NewLoop);		Value getOrCreateTripCount(Loop NewLoop);

/// Returns (and creates if needed) the trip count of the widened loop.		/// Returns (and creates if needed) the trip count of the widened loop.
Value getOrCreateVectorTripCount(Loop NewLoop);		Value getOrCreateVectorTripCount(Loop NewLoop);

		/// Returns a bitcasted value to the requested vector type.
		/// Also handles bitcasts of float <--> pointer types.
		Value* createBitCast(Value V, VectorType DstVTy, const DataLayout& DL);
		rengolinUnsubmitted Not Done Reply Inline Actions NIT: I think you should keep the `createBitOrPointerCast` name, to make it clear. rengolin: NIT: I think you should keep the `createBitOrPointerCast` name, to make it clear.

/// Emit a bypass check to see if the vector trip count is zero, including if		/// Emit a bypass check to see if the vector trip count is zero, including if
/// it overflows.		/// it overflows.
void emitMinimumIterationCountCheck(Loop L, BasicBlock Bypass);		void emitMinimumIterationCountCheck(Loop L, BasicBlock Bypass);
/// Emit a bypass check to see if all of the SCEV assumptions we've		/// Emit a bypass check to see if all of the SCEV assumptions we've
/// had to make are correct.		/// had to make are correct.
void emitSCEVChecks(Loop L, BasicBlock Bypass);		void emitSCEVChecks(Loop L, BasicBlock Bypass);
/// Emit bypass checks to check any memory assumptions we may have made.		/// Emit bypass checks to check any memory assumptions we may have made.
void emitMemRuntimeChecks(Loop L, BasicBlock Bypass);		void emitMemRuntimeChecks(Loop L, BasicBlock Bypass);
▲ Show 20 Lines • Show All 2,265 Lines • ▼ Show 20 Lines
void InnerLoopVectorizer::vectorizeInterleaveGroup(Instruction *Instr) {		void InnerLoopVectorizer::vectorizeInterleaveGroup(Instruction *Instr) {
const InterleaveGroup *Group = Legal->getInterleavedAccessGroup(Instr);		const InterleaveGroup *Group = Legal->getInterleavedAccessGroup(Instr);
assert(Group && "Fail to get an interleaved access group.");		assert(Group && "Fail to get an interleaved access group.");

// Skip if current instruction is not the insert position.		// Skip if current instruction is not the insert position.
if (Instr != Group->getInsertPos())		if (Instr != Group->getInsertPos())
return;		return;

		const DataLayout &DL = Instr->getModule()->getDataLayout();
Value *Ptr = getPointerOperand(Instr);		Value *Ptr = getPointerOperand(Instr);

// Prepare for the vector type of the interleaved load/store.		// Prepare for the vector type of the interleaved load/store.
Type *ScalarTy = getMemInstValueType(Instr);		Type *ScalarTy = getMemInstValueType(Instr);
unsigned InterleaveFactor = Group->getFactor();		unsigned InterleaveFactor = Group->getFactor();
Type VecTy = VectorType::get(ScalarTy, InterleaveFactor VF);		Type VecTy = VectorType::get(ScalarTy, InterleaveFactor VF);
Type *PtrTy = VecTy->getPointerTo(getMemInstAddressSpace(Instr));		Type *PtrTy = VecTy->getPointerTo(getMemInstAddressSpace(Instr));

▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	for (unsigned I = 0; I < InterleaveFactor; ++I) {
Constant *StrideMask = createStrideMask(Builder, I, InterleaveFactor, VF);		Constant *StrideMask = createStrideMask(Builder, I, InterleaveFactor, VF);
for (unsigned Part = 0; Part < UF; Part++) {		for (unsigned Part = 0; Part < UF; Part++) {
Value *StridedVec = Builder.CreateShuffleVector(		Value *StridedVec = Builder.CreateShuffleVector(
NewLoads[Part], UndefVec, StrideMask, "strided.vec");		NewLoads[Part], UndefVec, StrideMask, "strided.vec");

// If this member has different type, cast the result type.		// If this member has different type, cast the result type.
if (Member->getType() != ScalarTy) {		if (Member->getType() != ScalarTy) {
VectorType *OtherVTy = VectorType::get(Member->getType(), VF);		VectorType *OtherVTy = VectorType::get(Member->getType(), VF);
StridedVec = Builder.CreateBitOrPointerCast(StridedVec, OtherVTy);		StridedVec = createBitCast(StridedVec, OtherVTy, DL);
		AyalUnsubmitted Done Reply Inline Actions Same floating-point-vs-pointer type casting issue may hold for interleaved loads here as well, right? Ayal: Same floating-point-vs-pointer type casting issue may hold for interleaved loads here as well…
		AyalUnsubmitted Not Done Reply Inline Actions Thanks. This interleaved loads case requires a test. Ayal: Thanks. This interleaved loads case requires a test.
}		}

if (Group->isReverse())		if (Group->isReverse())
StridedVec = reverseVector(StridedVec);		StridedVec = reverseVector(StridedVec);

VectorLoopValueMap.setVectorValue(Member, Part, StridedVec);		VectorLoopValueMap.setVectorValue(Member, Part, StridedVec);
}		}
}		}
Show All 12 Lines	for (unsigned i = 0; i < InterleaveFactor; i++) {
Instruction *Member = Group->getMember(i);		Instruction *Member = Group->getMember(i);
assert(Member && "Fail to get a member from an interleaved store group");		assert(Member && "Fail to get a member from an interleaved store group");

Value *StoredVec = getOrCreateVectorValue(		Value *StoredVec = getOrCreateVectorValue(
cast<StoreInst>(Member)->getValueOperand(), Part);		cast<StoreInst>(Member)->getValueOperand(), Part);
if (Group->isReverse())		if (Group->isReverse())
StoredVec = reverseVector(StoredVec);		StoredVec = reverseVector(StoredVec);

// If this member has different type, cast it to an unified type.		// If this member has different type, cast it to a unified type.

		AyalUnsubmitted Done Reply Inline Actions While you're at it, please correct this typo: "... cast it to a[n] unified type". Can continue and comment here what this unified type may be. Ayal: While you're at it, please correct this typo: "... cast it to a[n] unified type". Can continue…
if (StoredVec->getType() != SubVT)		if (StoredVec->getType() != SubVT)
StoredVec = Builder.CreateBitOrPointerCast(StoredVec, SubVT);		StoredVec = createBitCast(StoredVec, SubVT, DL);

StoredVecs.push_back(StoredVec);		StoredVecs.push_back(StoredVec);
}		}
		AyalUnsubmitted Done Reply Inline Actions Or the other way around, when the last appearing store marking the insertion position of the final wide store has SubVT type of floating point, and another member has StoredVec type of a pointer. E.g., when the fields are swapped within the struct. Ayal: Or the other way around, when the last appearing store marking the insertion position of the…
		AyalUnsubmitted Not Done Reply Inline Actions This swapped case requires a test. Ayal: This swapped case requires a test.

// Concatenate all vectors into a wide vector.		// Concatenate all vectors into a wide vector.
		manojguptaAuthorUnsubmitted Not Done Reply Inline Actions Loop vectorize crashes when trying to do the following cast : <4 x float> <float 0xC415AF1D80000000, float 0xC415AF1D80000000, float 0xC415AF1D80000000, float 0xC415AF1D80000000> to <4 x %struct.CvNode1D* > As a floating point type cannot be directly casted to a pointer type (even if bitwidth is same), the crash can be avoided using two bitcasts (float->int and int-> pointer). manojgupta: Loop vectorize crashes when trying to do the following cast : <4 x float> <float…
Value *WideVec = concatenateVectors(Builder, StoredVecs);		Value *WideVec = concatenateVectors(Builder, StoredVecs);

// Interleave the elements in the wide vector.		// Interleave the elements in the wide vector.
Constant *IMask = createInterleaveMask(Builder, VF, InterleaveFactor);		Constant *IMask = createInterleaveMask(Builder, VF, InterleaveFactor);
Value *IVec = Builder.CreateShuffleVector(WideVec, UndefVec, IMask,		Value *IVec = Builder.CreateShuffleVector(WideVec, UndefVec, IMask,
"interleaved.vec");		"interleaved.vec");

Instruction *NewStoreInstr =		Instruction *NewStoreInstr =
Builder.CreateAlignedStore(IVec, NewPtrs[Part], Group->getAlignment());		Builder.CreateAlignedStore(IVec, NewPtrs[Part], Group->getAlignment());
		AyalUnsubmitted Not Done Reply Inline Actions At this stage assert that this 'else' is not reached, rather than return silently w/o generating the final wide store. Suggest to add a condition to Legal checking that types are compatible when forming interleave-groups. Ayal: At this stage assert that this 'else' is not reached, rather than return silently w/o…
		manojguptaAuthorUnsubmitted Not Done Reply Inline Actions Added assert in the createBitCast function. manojgupta: Added assert in the createBitCast function.
addMetadata(NewStoreInstr, Instr);		addMetadata(NewStoreInstr, Instr);
}		}
}		}

void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr) {		void InnerLoopVectorizer::vectorizeMemoryInstruction(Instruction *Instr) {
// Attempt to issue a wide load.		// Attempt to issue a wide load.
LoadInst *LI = dyn_cast<LoadInst>(Instr);		LoadInst *LI = dyn_cast<LoadInst>(Instr);
StoreInst *SI = dyn_cast<StoreInst>(Instr);		StoreInst *SI = dyn_cast<StoreInst>(Instr);
▲ Show 20 Lines • Show All 270 Lines • ▼ Show 20 Lines	if (VF > 1 && Legal->requiresScalarEpilogue()) {
R = Builder.CreateSelect(IsZero, Step, R);		R = Builder.CreateSelect(IsZero, Step, R);
}		}

VectorTripCount = Builder.CreateSub(TC, R, "n.vec");		VectorTripCount = Builder.CreateSub(TC, R, "n.vec");

return VectorTripCount;		return VectorTripCount;
}		}

		Value* InnerLoopVectorizer::createBitCast(Value V, VectorType DstVTy,
		const DataLayout& DL) {
		// Do a direct cast if a safe direct cast is possible.
		if (CastInst::isBitOrNoopPointerCastable(V->getType(), DstVTy, DL)) {
		return Builder.CreateBitOrPointerCast(V, DstVTy);
		}
		// Verify that V is a vector type with same number of elements as DstVTy.
		AyalUnsubmitted Not Done Reply Inline Actions Remove the first assert by turning dyn_cast<> into cast<>. Add a message to the second assert. Ayal: Remove the first assert by turning dyn_cast<> into cast<>. Add a message to the second assert.
		unsigned VF = DstVTy->getNumElements();
		VectorType *SrcVecTy = dyn_cast<VectorType>(V->getType());
		assert(SrcVecTy);
		AyalUnsubmitted Not Done Reply Inline Actions Add a message to the assert. Ayal: Add a message to the assert.
		assert(VF == SrcVecTy->getNumElements());
		Type *SrcElemTy = SrcVecTy->getElementType();
		Type *DstElemTy = DstVTy->getElementType();
		assert(DL.getTypeSizeInBits(SrcElemTy) == DL.getTypeSizeInBits(DstElemTy));

		// The previous castable check does not cover the bitcasts between
		// vector<int> and vector<ptr> types and may fail. So try another time
		// but using element types.
		rengolinUnsubmitted Not Done Reply Inline Actions Could you just keep this version? Most of the code above will be nops anyway, no need to fast-track identical calls. rengolin: Could you just keep this version? Most of the code above will be nops anyway, no need to fast…
		if (CastInst::isBitOrNoopPointerCastable(SrcElemTy, DstElemTy, DL)) {
		return Builder.CreateBitOrPointerCast(V, DstVTy);
		}
		AyalUnsubmitted Not Done Reply Inline Actions ditto Ayal: ditto
		// V cannot be directly casted to desired vector type.
		// May happen when V is a floating point vector but DstVTy is a vector of pointers
		AyalUnsubmitted Not Done Reply Inline Actions check clang-format indentation. Ayal: check clang-format indentation.
		// or vice-versa. Handle this using a two-step bitcast using an intermediate Integer
		// type for the bitcast i.e. Ptr <-> Int <-> Float.
		assert(DstElemTy->isPointerTy() != SrcElemTy->isPointerTy());
		assert(DstElemTy->isFloatingPointTy() != SrcElemTy->isFloatingPointTy());
		Type *IntTy = IntegerType::getIntNTy(V->getContext(),
		DL.getTypeSizeInBits(SrcElemTy));
		VectorType *VecIntTy = VectorType::get(IntTy, VF);
		Value *CastVal = Builder.CreateBitOrPointerCast(V, VecIntTy);
		return Builder.CreateBitOrPointerCast(CastVal, DstVTy);
		}

void InnerLoopVectorizer::emitMinimumIterationCountCheck(Loop *L,		void InnerLoopVectorizer::emitMinimumIterationCountCheck(Loop *L,
BasicBlock *Bypass) {		BasicBlock *Bypass) {
Value *Count = getOrCreateTripCount(L);		Value *Count = getOrCreateTripCount(L);
BasicBlock *BB = L->getLoopPreheader();		BasicBlock *BB = L->getLoopPreheader();
IRBuilder<> Builder(BB->getTerminator());		IRBuilder<> Builder(BB->getTerminator());

// Generate code to check if the loop's trip count is less than VF * UF, or		// Generate code to check if the loop's trip count is less than VF * UF, or
// equal to it in case a scalar epilogue is required; this implies that the		// equal to it in case a scalar epilogue is required; this implies that the
// vector trip count is zero. This check also covers the case where adding one		// vector trip count is zero. This check also covers the case where adding one
// to the backedge-taken count overflowed leading to an incorrect trip count		// to the backedge-taken count overflowed leading to an incorrect trip count
// of zero. In this case we will also jump to the scalar loop.		// of zero. In this case we will also jump to the scalar loop.
auto P = Legal->requiresScalarEpilogue() ? ICmpInst::ICMP_ULE		auto P = Legal->requiresScalarEpilogue() ? ICmpInst::ICMP_ULE
: ICmpInst::ICMP_ULT;		: ICmpInst::ICMP_ULT;
Value *CheckMinIters = Builder.CreateICmp(		Value *CheckMinIters = Builder.CreateICmp(
P, Count, ConstantInt::get(Count->getType(), VF * UF), "min.iters.check");		P, Count, ConstantInt::get(Count->getType(), VF * UF), "min.iters.check");

BasicBlock *NewBB = BB->splitBasicBlock(BB->getTerminator(), "vector.ph");		BasicBlock *NewBB = BB->splitBasicBlock(BB->getTerminator(), "vector.ph");
// Update dominator tree immediately if the generated block is a		// Update dominator tree immediately if the generated block is a
// LoopBypassBlock because SCEV expansions to generate loop bypass		// LoopBypassBlock because SCEV expansions to generate loop bypass
// checks may query it before the current function is finished.		// checks may query it before the current function is finished.
DT->addNewBlock(NewBB, BB);		DT->addNewBlock(NewBB, BB);
if (L->getParentLoop())		if (L->getParentLoop())
L->getParentLoop()->addBasicBlockToLoop(NewBB, *LI);		L->getParentLoop()->addBasicBlockToLoop(NewBB, *LI);
		AyalUnsubmitted Not Done Reply Inline Actions This function assumes V has vector type, having same number of elements as VTy, and both have elements of same size in bits. These properties can be asserted upfront. Suffice to check once if direct cast works and exit early if so. If not, assert one is a pointer and the other a floating point. Suffice to have only one copy of the code creating the two casts. Explain (in method declaration above) that we're bit casting from V to VTy, and/or use more informative variable names. Ayal: This function assumes V has vector type, having same number of elements as VTy, and both have…
ReplaceInstWithInst(BB->getTerminator(),		ReplaceInstWithInst(BB->getTerminator(),
BranchInst::Create(Bypass, NewBB, CheckMinIters));		BranchInst::Create(Bypass, NewBB, CheckMinIters));
LoopBypassBlocks.push_back(BB);		LoopBypassBlocks.push_back(BB);
}		}

void InnerLoopVectorizer::emitSCEVChecks(Loop L, BasicBlock Bypass) {		void InnerLoopVectorizer::emitSCEVChecks(Loop L, BasicBlock Bypass) {
BasicBlock *BB = L->getLoopPreheader();		BasicBlock *BB = L->getLoopPreheader();

// Generate the code to check that the SCEV assumptions that we made.		// Generate the code to check that the SCEV assumptions that we made.
// We want the new basic block to start at the first instruction in a		// We want the new basic block to start at the first instruction in a
// sequence of instructions that form a check.		// sequence of instructions that form a check.
SCEVExpander Exp(*PSE.getSE(), Bypass->getModule()->getDataLayout(),		SCEVExpander Exp(*PSE.getSE(), Bypass->getModule()->getDataLayout(),
"scev.check");		"scev.check");
Value *SCEVCheck =		Value *SCEVCheck =
Exp.expandCodeForPredicate(&PSE.getUnionPredicate(), BB->getTerminator());		Exp.expandCodeForPredicate(&PSE.getUnionPredicate(), BB->getTerminator());

if (auto *C = dyn_cast<ConstantInt>(SCEVCheck))		if (auto *C = dyn_cast<ConstantInt>(SCEVCheck))
if (C->isZero())		if (C->isZero())
		mkazantsevUnsubmitted Not Done Reply Inline Actions You can sink these 3 lines and last 3 lines of "else" branch to after the "if" (if IntTy is decpared above the if). mkazantsev: You can sink these 3 lines and last 3 lines of "else" branch to after the "if" (if IntTy is…
return;		return;

// Create a new block containing the stride check.		// Create a new block containing the stride check.
BB->setName("vector.scevcheck");		BB->setName("vector.scevcheck");
auto *NewBB = BB->splitBasicBlock(BB->getTerminator(), "vector.ph");		auto *NewBB = BB->splitBasicBlock(BB->getTerminator(), "vector.ph");
// Update dominator tree immediately if the generated block is a		// Update dominator tree immediately if the generated block is a
// LoopBypassBlock because SCEV expansions to generate loop bypass		// LoopBypassBlock because SCEV expansions to generate loop bypass
// checks may query it before the current function is finished.		// checks may query it before the current function is finished.
▲ Show 20 Lines • Show All 5,342 Lines • Show Last 20 Lines

test/CodeGen/ARM/loopvectorize_pr33804_1.ll

This file was added.

				; RUN: opt -loop-vectorize -debug -S < %s 2>&1 \| FileCheck %s

				; This checks we don't crash when vectorizing if vectorizer ends up
				; requiring casting float to a pointer type.

				; ModuleID = 'bugpoint-reduced-simplified.bc'
				source_filename = "bugpoint-output-26dbd81.bc"
				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "armv7--linux-gnueabihf"

				%struct.CvNode1D = type { float, %struct.CvNode1D* }

				@.str.13 = external unnamed_addr constant [1 x i8], align 1

				; CHECK-LABEL: @cvCalcEMD2
				; CHECK: vector.body
				; CHECK: store <8 x %struct.CvNode1D*>
				define void @cvCalcEMD2() local_unnamed_addr #0 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
				entry:
				rengolinUnsubmitted Not Done Reply Inline Actions Do you really need all the exception handling here? I imagine that the loop below would do just fine. rengolin: Do you really need all the exception handling here? I imagine that the loop below would do just…
				invoke void @cvGetMat()
				to label %invoke.cont unwind label %lpad.loopexit.split-lp

				invoke.cont: ; preds = %entry
				invoke void @cvGetMat()
				to label %invoke.cont3 unwind label %lpad.loopexit.split-lp

				invoke.cont3: ; preds = %invoke.cont
				invoke void @_Znaj() #3
				to label %call.i.i.i1408.noexc unwind label %lpad.loopexit.split-lp

				lpad.loopexit.split-lp: ; preds = %invoke.cont3, %entry, %invoke.cont
				%lpad.loopexit.split-lp2387 = landingpad { i8*, i32 }
				cleanup
				resume { i8*, i32 } undef

				call.i.i.i1408.noexc: ; preds = %invoke.cont3
				invoke void @_ZNSsC1EPKcRKSaIcE()
				to label %invoke.cont188.i unwind label %lpad187.i

				invoke.cont188.i: ; preds = %call.i.i.i1408.noexc
				br label %invoke.cont203.i

				invoke.cont203.i: ; preds = %invoke.cont188.i
				invoke void @_ZN2cv5errorERKNS_9ExceptionE()
				to label %invoke.cont206.i unwind label %lpad205.i

				invoke.cont206.i: ; preds = %invoke.cont203.i
				br label %for.body14.i.i

				lpad187.i: ; preds = %call.i.i.i1408.noexc
				%0 = landingpad { i8*, i32 }
				cleanup
				unreachable

				lpad205.i: ; preds = %invoke.cont203.i
				%1 = landingpad { i8*, i32 }
				cleanup
				unreachable

				for.body14.i.i: ; preds = %for.body14.i.i, %invoke.cont206.i
				%i.1424.i.i = phi i32 [ %inc21.i.i, %for.body14.i.i ], [ 0, %invoke.cont206.i ]
				%arrayidx15.i.i1427 = getelementptr inbounds %struct.CvNode1D, %struct.CvNode1D* undef, i32 %i.1424.i.i
				%val.i.i = getelementptr inbounds %struct.CvNode1D, %struct.CvNode1D* %arrayidx15.i.i1427, i32 0, i32 0
				store float 0xC415AF1D80000000, float* %val.i.i, align 4
				%next19.i.i = getelementptr inbounds %struct.CvNode1D, %struct.CvNode1D* undef, i32 %i.1424.i.i, i32 1
				store %struct.CvNode1D* undef, %struct.CvNode1D** %next19.i.i, align 4
				%inc21.i.i = add nuw nsw i32 %i.1424.i.i, 1
				%exitcond438.i.i = icmp eq i32 %inc21.i.i, 0
				br i1 %exitcond438.i.i, label %for.end22.i.i, label %for.body14.i.i

				for.end22.i.i: ; preds = %for.body14.i.i
				unreachable
				}

				declare void @cvGetMat() local_unnamed_addr #1

				declare i32 @__gxx_personality_v0(...)

				declare void @_ZN2cv5errorERKNS_9ExceptionE() local_unnamed_addr #1

				declare void @_ZNSsC1EPKcRKSaIcE() unnamed_addr #1

				; Function Attrs: nobuiltin
				declare void @_Znaj() local_unnamed_addr #2

				attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+dsp,+neon,+vfp3,-thumb-mode" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+dsp,+neon,+vfp3,-thumb-mode" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #2 = { nobuiltin "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+dsp,+neon,+vfp3,-thumb-mode" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #3 = { builtin }

test/CodeGen/ARM/loopvectorize_pr33804_2.ll

This file was added.

				; RUN: opt -loop-vectorize -debug -S < %s 2>&1 \| FileCheck %s

				; This checks we don't crash when vectorizing if vectorizer ends up
				; requiring casting pointer to a float type.

				; ModuleID = 'bugpoint-reduced-simplified.bc'
				source_filename = "bugpoint-output-26dbd81.bc"
				target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
				target triple = "armv7--linux-gnueabihf"

				%struct.CvNode1D = type { %struct.CvNode1D*, float }

				@.str.13 = external unnamed_addr constant [1 x i8], align 1

				; CHECK-LABEL: @cvCalcEMD2
				; CHECK: vector.body
				; CHECK: store <8 x float>
				define void @cvCalcEMD2() local_unnamed_addr #0 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
				entry:
				rengolinUnsubmitted Not Done Reply Inline Actions Same here. Also, you could merge the two function into one single IR file. rengolin: Same here. Also, you could merge the two function into one single IR file.
				invoke void @cvGetMat()
				to label %invoke.cont unwind label %lpad.loopexit.split-lp

				invoke.cont: ; preds = %entry
				invoke void @cvGetMat()
				to label %invoke.cont3 unwind label %lpad.loopexit.split-lp

				invoke.cont3: ; preds = %invoke.cont
				invoke void @_Znaj() #3
				to label %call.i.i.i1408.noexc unwind label %lpad.loopexit.split-lp

				lpad.loopexit.split-lp: ; preds = %invoke.cont3, %entry, %invoke.cont
				%lpad.loopexit.split-lp2387 = landingpad { i8*, i32 }
				cleanup
				resume { i8*, i32 } undef

				call.i.i.i1408.noexc: ; preds = %invoke.cont3
				invoke void @_ZNSsC1EPKcRKSaIcE()
				to label %invoke.cont188.i unwind label %lpad187.i

				invoke.cont188.i: ; preds = %call.i.i.i1408.noexc
				br label %invoke.cont203.i

				invoke.cont203.i: ; preds = %invoke.cont188.i
				invoke void @_ZN2cv5errorERKNS_9ExceptionE()
				to label %invoke.cont206.i unwind label %lpad205.i

				invoke.cont206.i: ; preds = %invoke.cont203.i
				br label %for.body14.i.i

				lpad187.i: ; preds = %call.i.i.i1408.noexc
				%0 = landingpad { i8*, i32 }
				cleanup
				unreachable

				lpad205.i: ; preds = %invoke.cont203.i
				%1 = landingpad { i8*, i32 }
				cleanup
				unreachable

				for.body14.i.i: ; preds = %for.body14.i.i, %invoke.cont206.i
				%i.1424.i.i = phi i32 [ %inc21.i.i, %for.body14.i.i ], [ 0, %invoke.cont206.i ]
				%next19.i.i = getelementptr inbounds %struct.CvNode1D, %struct.CvNode1D* undef, i32 %i.1424.i.i, i32 0
				store %struct.CvNode1D* undef, %struct.CvNode1D** %next19.i.i, align 4
				%arrayidx15.i.i1427 = getelementptr inbounds %struct.CvNode1D, %struct.CvNode1D* undef, i32 %i.1424.i.i
				%val.i.i = getelementptr inbounds %struct.CvNode1D, %struct.CvNode1D* %arrayidx15.i.i1427, i32 0, i32 1
				store float 0xC415AF1D80000000, float* %val.i.i, align 4
				%inc21.i.i = add nuw nsw i32 %i.1424.i.i, 1
				%exitcond438.i.i = icmp eq i32 %inc21.i.i, 0
				br i1 %exitcond438.i.i, label %for.end22.i.i, label %for.body14.i.i

				for.end22.i.i: ; preds = %for.body14.i.i
				unreachable
				}

				declare void @cvGetMat() local_unnamed_addr #1

				declare i32 @__gxx_personality_v0(...)

				declare void @_ZN2cv5errorERKNS_9ExceptionE() local_unnamed_addr #1

				declare void @_ZNSsC1EPKcRKSaIcE() unnamed_addr #1

				; Function Attrs: nobuiltin
				declare void @_Znaj() local_unnamed_addr #2

				attributes #0 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+dsp,+neon,+vfp3,-thumb-mode" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+dsp,+neon,+vfp3,-thumb-mode" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #2 = { nobuiltin "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+dsp,+neon,+vfp3,-thumb-mode" "unsafe-fp-math"="false" "use-soft-float"="false" }
				attributes #3 = { builtin }

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorizer] Use two step casting for float to pointer types.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 111914

lib/Transforms/Vectorize/LoopVectorize.cpp

test/CodeGen/ARM/loopvectorize_pr33804_1.ll

test/CodeGen/ARM/loopvectorize_pr33804_2.ll

[LoopVectorizer] Use two step casting for float to pointer types.
ClosedPublic