This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorizer] Use two step casting for float to pointer types.
ClosedPublic

Authored by manojgupta on Jul 17 2017, 12:14 PM.

Details

Summary

LoopVectorizer is creating casts between vec<ptr> and vec<float> types
on ARM when compiling OpenCV. Since, tIs is illegal to directly cast a
floating point type to a pointer type even if the types have same size
causing a crash. Fix the crash using a two-step casting by bitcasting
to integer and integer to pointer/float.
Fixes PR33804.

Event Timeline

manojgupta created this revision.Jul 17 2017, 12:14 PM
manojgupta added inline comments.Jul 17 2017, 12:25 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
2985

Loop vectorize crashes when trying to do the following cast :
<4 x float> <float 0xC415AF1D80000000, float 0xC415AF1D80000000, float 0xC415AF1D80000000, float 0xC415AF1D80000000> to <4 x %struct.CvNode1D* >

As a floating point type cannot be directly casted to a pointer type (even if bitwidth is same), the crash can be avoided using two bitcasts (float->int and int-> pointer).

Setting aside from the fact that no one should ever be casting floating point to pointers, perhaps we shouldn't be even trying to do this transformation here, as I'm not sure this could have a guaranteed semantics in this case, and probably better to just bail the vectorisation?

Also, tests, etc. please.

Setting aside from the fact that no one should ever be casting floating point to pointers, perhaps we shouldn't be even trying to do this transformation here, as I'm not sure this could have a guaranteed semantics in this case, and probably better to just bail the vectorisation?

Also, tests, etc. please.

I agree that this transformation is unsafe and bailing out is the preferred way. Only if I knew where to add that check.

mkuper edited edge metadata.Jul 17 2017, 1:53 PM

Setting aside from the fact that no one should ever be casting floating point to pointers, perhaps we shouldn't be even trying to do this transformation here, as I'm not sure this could have a guaranteed semantics in this case, and probably better to just bail the vectorisation?

I'm not sure about this. What's happening here, if I understand correctly, is that we have a struct:

typedef struct CvNode1D
{
    float val;
    struct CvNode1D *next;
}
CvNode1D;

And we're trying to vectorize code that loads these structs. Since the pointer and the float have the same width, we can load four of the structs as, e.g. 2 * <4 x float>, and then use shuffles to get a vector of 4 floats and a vectors of 4 pointers.

Anyway, I agree this isn't the right fix, but my gut feeling is that the right fix is to actually allow the builder to create a bitcast between a pointer and a pointer-sized float.
I don't see anything in the langref that makes the semantics of this undefined. Renato, what kind of semantic issues do you see here?

Also, tests, etc. please.

+1

Setting aside from the fact that no one should ever be casting floating point to pointers, perhaps we shouldn't be even trying to do this transformation here, as I'm not sure this could have a guaranteed semantics in this case, and probably better to just bail the vectorisation?

I'm not sure about this. What's happening here, if I understand correctly, is that we have a struct:

typedef struct CvNode1D
{
    float val;
    struct CvNode1D *next;
}
CvNode1D;

And we're trying to vectorize code that loads these structs. Since the pointer and the float have the same width, we can load four of the structs as, e.g. 2 * <4 x float>, and then use shuffles to get a vector of 4 floats and a vectors of 4 pointers.

Anyway, I agree this isn't the right fix, but my gut feeling is that the right fix is to actually allow the builder to create a bitcast between a pointer and a pointer-sized float.

Should I update the Builder's CreateBitOrPointerCast function to handle float to pointer casts using float -> int and int -> pointer casts? I can do that or I can create a local pass specific casting function to handle float -> ptr + other CreateBitOrPointerCast routines used here.

I don't see anything in the langref that makes the semantics of this undefined. Renato, what kind of semantic issues do you see here?

Also, tests, etc. please.

+1

I am trying to create a reduced test case but no success so far.

Should I update the Builder's CreateBitOrPointerCast function to handle float to pointer casts using float -> int and int -> pointer casts? I can do that or I can create a local pass specific casting function to handle float -> ptr + other CreateBitOrPointerCast routines used here.

Why can't you do the bitcast directly inside CreateBitOrPointerCast()? You'll also want to upadte isBitOrNoopPointerCastable(),
Anyway, you probably want a different set of reviewers for that patch - I'm really not the authority on that, and at least Renato seems to have a completely different opinion.

I am trying to create a reduced test case but no success so far.

The standard way to do this is to dump the IR just before the vectorizer, and then use bugpoint to reduce.
Have you tried that?

I am trying to create a reduced test case but no success so far.

The standard way to do this is to dump the IR just before the vectorizer, and then use bugpoint to reduce.
Have you tried that?

I did that already here: https://llvm.org/PR33804#c5

typedef struct CvNode1D
{
    float val;
    struct CvNode1D *next;
}
CvNode1D;

And we're trying to vectorize code that loads these structs. Since the pointer and the float have the same width, we can load four of the structs as, e.g. 2 * <4 x float>, and then use shuffles to get a vector of 4 floats and a vectors of 4 pointers.

Right, sorry, I missed the bug reference.

Anyway, I agree this isn't the right fix, but my gut feeling is that the right fix is to actually allow the builder to create a bitcast between a pointer and a pointer-sized float.
I don't see anything in the langref that makes the semantics of this undefined. Renato, what kind of semantic issues do you see here?

I agree the vectorizer can "safely" assume this case is ok because we know the original datatype was a pointer anyway and this is part of a strided load, but I feel we'd be opening a can of worms if we start allowing any float<->pointer conversion by default.

For example, a different case would be pointer->float->double->pointer. C has automatic promotions, and some corner cases may slip and create that sequence, which would destroy the bit-pattern and therefore the memory address.

So, if we can do this in this specific case, Manoj's current fix is "better" than moving it up CreateBitOrPointerCast, because we know what the semantics is. Or maybe we just load them as "data" (i32/i64?) and then bitcast safely?

Makes sense?

Anyway, I agree this isn't the right fix, but my gut feeling is that the right fix is to actually allow the builder to create a bitcast between a pointer and a pointer-sized float.
I don't see anything in the langref that makes the semantics of this undefined. Renato, what kind of semantic issues do you see here?

I agree the vectorizer can "safely" assume this case is ok because we know the original datatype was a pointer anyway and this is part of a strided load, but I feel we'd be opening a can of worms if we start allowing any float<->pointer conversion by default.

For example, a different case would be pointer->float->double->pointer. C has automatic promotions, and some corner cases may slip and create that sequence, which would destroy the bit-pattern and therefore the memory address.

So, if we can do this in this specific case, Manoj's current fix is "better" than moving it up CreateBitOrPointerCast, because we know what the semantics is. Or maybe we just load them as "data" (i32/i64?) and then bitcast safely?

Makes sense?

First of all, sorry, you're right, I misread the langref - the cast really is illegal in IR, it's not a builder issue.
I really don't think it should be illegal - but a patch trying to fix a crash is probably not the right place to shave that yak,

I guess this is fine as a stop-gap. I think we need the other direction too, though. That is, should cover both the pointer -> float and float -> pointer cases. (And have tests for both.)

Ayal edited edge metadata.Jul 18 2017, 1:38 PM

... Or maybe we just load them as "data" (i32/i64?) and then bitcast safely?

Makes sense?

It does to me. The wide load and wide store are simply trying to move the packed bits together; each separate shuffle has its specific (possibly distinct) type.

(A follow-up issue may rise when attempting to interleave loads/stores of different sizes together; we're not there yet.)

A few additional comments below, mostly for completeness. They can be addressed by follow-up patches, if fixing the PR case (only) first is preferred.

lib/Transforms/Vectorize/LoopVectorize.cpp
2948

Same floating-point-vs-pointer type casting issue may hold for interleaved loads here as well, right?

2977–2978

While you're at it, please correct this typo: "... cast it to a[n] unified type". Can continue and comment here what this unified type may be.

2983

Or the other way around, when the last appearing store marking the insertion position of the final wide store has SubVT type of floating point, and another member has StoredVec type of a pointer. E.g., when the fields are swapped within the struct.

2994

At this stage assert that this 'else' is not reached, rather than return silently w/o generating the final wide store.

Suggest to add a condition to Legal checking that types are compatible when forming interleave-groups.

Some cleanup, moved the casting to its own function
and added a test case.

I am trying to create a reduced test case but no success so far.

The standard way to do this is to dump the IR just before the vectorizer, and then use bugpoint to reduce.
Have you tried that?

I did that already here: https://llvm.org/PR33804#c5

Thanks,
Added the test case to the review.

manojgupta marked 3 inline comments as done.Jul 18 2017, 3:49 PM
manojgupta added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
2994

Added assert in the createBitCast function.

Ayal added inline comments.Jul 18 2017, 11:36 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
2948

Thanks. This interleaved loads case requires a test.

2983

This swapped case requires a test.

3337

This function assumes V has vector type, having same number of elements as VTy, and both have elements of same size in bits. These properties can be asserted upfront.

Suffice to check once if direct cast works and exit early if so.
If not, assert one is a pointer and the other a floating point.
Suffice to have only one copy of the code creating the two casts.

Explain (in method declaration above) that we're bit casting from V to VTy, and/or use more informative variable names.

mkazantsev added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
3355

You can sink these 3 lines and last 3 lines of "else" branch to after the "if" (if IntTy is decpared above the if).

rengolin added inline comments.Jul 19 2017, 3:48 AM
test/Transforms/LoopVectorize/pr33804.ll
10 ↗(On Diff #107194)

The ARM backend is not always compiled in all bots, this will break them. Please remove the triple.

Keeping the datalayout may be enough, if not, move them to target-specific directories and make sure you have at least on two very different targets (like ARM32 and x86_64).

86 ↗(On Diff #107194)

do you really need all those attributes? Try to remove them and see if they change the results. If not, just clean them up.

96 ↗(On Diff #107194)

You probably don't need any of those either.

Meinersbur added inline comments.Jul 19 2017, 8:46 AM
test/Transforms/LoopVectorize/pr33804.ll
1 ↗(On Diff #107194)

It looks like you are only CHECKing the IR output (-S). If this is the case you can remove -debug 2>&1 and REQUIRES: asserts

Mixing stderr and stdout is problematic because how they are interleaved is undefined.

Meinersbur added inline comments.Jul 19 2017, 9:58 AM
test/Transforms/LoopVectorize/pr33804.ll
19–58 ↗(On Diff #107194)

I assume none of these are needed and bugpoint was just unable to remove invoke calls. You could try to remove them manually.

Back to it after a (long) break.

  1. Added tests for both pointer->float and float->pointer cases.
  2. CLeaned up most conditions into asserts.
  3. Moved tests to Codegen/ARM. The crash disappears without arm tuple so can't remove it.
  4. Simplified/cleanedup the tests.

Removed -debug and stderr usage from the test cases.

rengolin added inline comments.Aug 21 2017, 8:37 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
590

NIT: I think you should keep the createBitOrPointerCast name, to make it clear.

3298

Could you just keep this version?

Most of the code above will be nops anyway, no need to fast-track identical calls.

test/CodeGen/ARM/loopvectorize_pr33804_1.ll
19

Do you really need all the exception handling here? I imagine that the loop below would do just fine.

test/CodeGen/ARM/loopvectorize_pr33804_2.ll
19

Same here. Also, you could merge the two function into one single IR file.

Addressed the following comments:

  1. Simplified the test cases and merged into a single file.
  2. Renamed the function.
  3. Kept a single call to direct bitcast.
manojgupta retitled this revision from [LoopVectorizer] Use two step casting for float to pointer type. to [LoopVectorizer] Use two step casting for float to pointer types..
manojgupta edited the summary of this revision. (Show Details)
manojgupta added reviewers: rengolin, srhines.
manojgupta removed a subscriber: javed.absar.

Updated summary.

Friendly ping for review.

Ayal added a comment.Aug 31 2017, 2:06 AM
  1. Moved tests to Codegen/ARM. The crash disappears without arm tuple so can't remove it

Is that because the stores don't get interleaved?

lib/Transforms/Vectorize/LoopVectorize.cpp
3287

Remove the first assert by turning dyn_cast<> into cast<>.

Add a message to the second assert.

3290

Add a message to the assert.

3301

ditto

3303

check clang-format indentation.

test/CodeGen/ARM/loopvectorize_pr33804.ll
17 ↗(On Diff #112752)

Best force vectorization width to 8 if that's the VF we expect; or expect a vector store of any width.

38 ↗(On Diff #112752)

Explain how the second test differs from the first; i.e., the first requires casting the float value to be stored, into a pointer, and the second requires casting the pointer value into a float.

63 ↗(On Diff #112752)

What about tests for float<->pointer conversions on interleaved groups of loads?

rengolin accepted this revision.Aug 31 2017, 2:07 AM

Sorry for the delay. It looks good to me now, thanks!

This revision is now accepted and ready to land.Aug 31 2017, 2:07 AM

Sorry, slight overlap. Please follow Ayal's comments before committing.

Ayal added a comment.Aug 31 2017, 4:06 AM

Sorry, slight overlap. Please follow Ayal's comments before committing.

My last comments are minor; I'm also ok with the patch after they are addressed.
Another final comment is to also have a test that compiles to 64 bits having a struct with double and pointer fields.

  1. Added messages to asserts.
  2. Fixed some indentation issues.
  3. Added test case for double <-> pointer for AArch64. I could not reproduce this on x86_64.
  4. Added more comments in unit tests to make it more clear.
manojgupta marked 2 inline comments as done.Aug 31 2017, 12:42 PM
manojgupta added inline comments.
test/CodeGen/ARM/loopvectorize_pr33804.ll
17 ↗(On Diff #112752)

Changed it to expect a vector store of any width since the concern is vectorizer should not crash, not the vectorized size.

63 ↗(On Diff #112752)

Because of my limited knowledge of how loop vectorizer works, I am not yet able to create a test case for loads which triggers float <-> pointer casting. Will try to create a test case for loads in a follow up commit.
I did add another test case for double <-> pointer for AArch64.

Ayal added inline comments.Sep 1 2017, 2:58 AM
test/CodeGen/ARM/loopvectorize_pr33804.ll
63 ↗(On Diff #112752)

You can simply first load the two fields, say fadd 0.5 to one and do pointer++ using gep to the other, then store them back; or even just copy them from one array to another.

manojgupta marked an inline comment as done.Sep 1 2017, 6:44 AM
manojgupta added inline comments.
test/CodeGen/ARM/loopvectorize_pr33804.ll
63 ↗(On Diff #112752)

Tried that but still no luck. Storing back to same addresses or even to some globals inside/outside the loop didn't help.
Is to ok to check this in first and try adding a test case for load in a follow up commit?

manojgupta closed this revision.Sep 1 2017, 8:37 AM
Ayal added a comment.Sep 16 2017, 6:17 AM

This was closed due to committing r312331, right? Code LGTM, for the record. Tests for interleaved loads of float/pointer should still be added, as this patch presumably handles them too.

I was able to generate the test cases for load ( https://reviews.llvm.org/D37967)