LoopVectorizer is creating casts between vec<ptr> and vec<float> types
on ARM when compiling OpenCV. Since, tIs is illegal to directly cast a
floating point type to a pointer type even if the types have same size
causing a crash. Fix the crash using a two-step casting by bitcasting
to integer and integer to pointer/float.
Fixes PR33804.
Details
Diff Detail
- Build Status
Buildable 9811 Build 9811: arc lint + arc unit
Event Timeline
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2972 | Loop vectorize crashes when trying to do the following cast : As a floating point type cannot be directly casted to a pointer type (even if bitwidth is same), the crash can be avoided using two bitcasts (float->int and int-> pointer). |
Setting aside from the fact that no one should ever be casting floating point to pointers, perhaps we shouldn't be even trying to do this transformation here, as I'm not sure this could have a guaranteed semantics in this case, and probably better to just bail the vectorisation?
Also, tests, etc. please.
I agree that this transformation is unsafe and bailing out is the preferred way. Only if I knew where to add that check.
I'm not sure about this. What's happening here, if I understand correctly, is that we have a struct:
typedef struct CvNode1D { float val; struct CvNode1D *next; } CvNode1D;
And we're trying to vectorize code that loads these structs. Since the pointer and the float have the same width, we can load four of the structs as, e.g. 2 * <4 x float>, and then use shuffles to get a vector of 4 floats and a vectors of 4 pointers.
Anyway, I agree this isn't the right fix, but my gut feeling is that the right fix is to actually allow the builder to create a bitcast between a pointer and a pointer-sized float.
I don't see anything in the langref that makes the semantics of this undefined. Renato, what kind of semantic issues do you see here?
Also, tests, etc. please.
+1
Should I update the Builder's CreateBitOrPointerCast function to handle float to pointer casts using float -> int and int -> pointer casts? I can do that or I can create a local pass specific casting function to handle float -> ptr + other CreateBitOrPointerCast routines used here.
I don't see anything in the langref that makes the semantics of this undefined. Renato, what kind of semantic issues do you see here?
Also, tests, etc. please.
+1
I am trying to create a reduced test case but no success so far.
Should I update the Builder's CreateBitOrPointerCast function to handle float to pointer casts using float -> int and int -> pointer casts? I can do that or I can create a local pass specific casting function to handle float -> ptr + other CreateBitOrPointerCast routines used here.
Why can't you do the bitcast directly inside CreateBitOrPointerCast()? You'll also want to upadte isBitOrNoopPointerCastable(),
Anyway, you probably want a different set of reviewers for that patch - I'm really not the authority on that, and at least Renato seems to have a completely different opinion.
I am trying to create a reduced test case but no success so far.
The standard way to do this is to dump the IR just before the vectorizer, and then use bugpoint to reduce.
Have you tried that?
Right, sorry, I missed the bug reference.
Anyway, I agree this isn't the right fix, but my gut feeling is that the right fix is to actually allow the builder to create a bitcast between a pointer and a pointer-sized float.
I don't see anything in the langref that makes the semantics of this undefined. Renato, what kind of semantic issues do you see here?
I agree the vectorizer can "safely" assume this case is ok because we know the original datatype was a pointer anyway and this is part of a strided load, but I feel we'd be opening a can of worms if we start allowing any float<->pointer conversion by default.
For example, a different case would be pointer->float->double->pointer. C has automatic promotions, and some corner cases may slip and create that sequence, which would destroy the bit-pattern and therefore the memory address.
So, if we can do this in this specific case, Manoj's current fix is "better" than moving it up CreateBitOrPointerCast, because we know what the semantics is. Or maybe we just load them as "data" (i32/i64?) and then bitcast safely?
Makes sense?
Anyway, I agree this isn't the right fix, but my gut feeling is that the right fix is to actually allow the builder to create a bitcast between a pointer and a pointer-sized float.
I don't see anything in the langref that makes the semantics of this undefined. Renato, what kind of semantic issues do you see here?I agree the vectorizer can "safely" assume this case is ok because we know the original datatype was a pointer anyway and this is part of a strided load, but I feel we'd be opening a can of worms if we start allowing any float<->pointer conversion by default.
For example, a different case would be pointer->float->double->pointer. C has automatic promotions, and some corner cases may slip and create that sequence, which would destroy the bit-pattern and therefore the memory address.
So, if we can do this in this specific case, Manoj's current fix is "better" than moving it up CreateBitOrPointerCast, because we know what the semantics is. Or maybe we just load them as "data" (i32/i64?) and then bitcast safely?
Makes sense?
First of all, sorry, you're right, I misread the langref - the cast really is illegal in IR, it's not a builder issue.
I really don't think it should be illegal - but a patch trying to fix a crash is probably not the right place to shave that yak,
I guess this is fine as a stop-gap. I think we need the other direction too, though. That is, should cover both the pointer -> float and float -> pointer cases. (And have tests for both.)
It does to me. The wide load and wide store are simply trying to move the packed bits together; each separate shuffle has its specific (possibly distinct) type.
(A follow-up issue may rise when attempting to interleave loads/stores of different sizes together; we're not there yet.)
A few additional comments below, mostly for completeness. They can be addressed by follow-up patches, if fixing the PR case (only) first is preferred.
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2935 | Same floating-point-vs-pointer type casting issue may hold for interleaved loads here as well, right? | |
2964–2965 | While you're at it, please correct this typo: "... cast it to a[n] unified type". Can continue and comment here what this unified type may be. | |
2970 | Or the other way around, when the last appearing store marking the insertion position of the final wide store has SubVT type of floating point, and another member has StoredVec type of a pointer. E.g., when the fields are swapped within the struct. | |
2981 | At this stage assert that this 'else' is not reached, rather than return silently w/o generating the final wide store. Suggest to add a condition to Legal checking that types are compatible when forming interleave-groups. |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2981 | Added assert in the createBitCast function. |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
2935 | Thanks. This interleaved loads case requires a test. | |
2970 | This swapped case requires a test. | |
3344 | This function assumes V has vector type, having same number of elements as VTy, and both have elements of same size in bits. These properties can be asserted upfront. Suffice to check once if direct cast works and exit early if so. Explain (in method declaration above) that we're bit casting from V to VTy, and/or use more informative variable names. |
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
3362 | You can sink these 3 lines and last 3 lines of "else" branch to after the "if" (if IntTy is decpared above the if). |
test/Transforms/LoopVectorize/pr33804.ll | ||
---|---|---|
10 ↗ | (On Diff #107194) | The ARM backend is not always compiled in all bots, this will break them. Please remove the triple. Keeping the datalayout may be enough, if not, move them to target-specific directories and make sure you have at least on two very different targets (like ARM32 and x86_64). |
86 ↗ | (On Diff #107194) | do you really need all those attributes? Try to remove them and see if they change the results. If not, just clean them up. |
96 ↗ | (On Diff #107194) | You probably don't need any of those either. |
test/Transforms/LoopVectorize/pr33804.ll | ||
---|---|---|
1 ↗ | (On Diff #107194) | It looks like you are only CHECKing the IR output (-S). If this is the case you can remove -debug 2>&1 and REQUIRES: asserts Mixing stderr and stdout is problematic because how they are interleaved is undefined. |
test/Transforms/LoopVectorize/pr33804.ll | ||
---|---|---|
19–58 ↗ | (On Diff #107194) | I assume none of these are needed and bugpoint was just unable to remove invoke calls. You could try to remove them manually. |
Back to it after a (long) break.
- Added tests for both pointer->float and float->pointer cases.
- CLeaned up most conditions into asserts.
- Moved tests to Codegen/ARM. The crash disappears without arm tuple so can't remove it.
- Simplified/cleanedup the tests.
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
579 | NIT: I think you should keep the createBitOrPointerCast name, to make it clear. | |
3319 | Could you just keep this version? Most of the code above will be nops anyway, no need to fast-track identical calls. | |
test/CodeGen/ARM/loopvectorize_pr33804_1.ll | ||
18 ↗ | (On Diff #111920) | Do you really need all the exception handling here? I imagine that the loop below would do just fine. |
test/CodeGen/ARM/loopvectorize_pr33804_2.ll | ||
18 ↗ | (On Diff #111920) | Same here. Also, you could merge the two function into one single IR file. |
Addressed the following comments:
- Simplified the test cases and merged into a single file.
- Renamed the function.
- Kept a single call to direct bitcast.
- Moved tests to Codegen/ARM. The crash disappears without arm tuple so can't remove it
Is that because the stores don't get interleaved?
lib/Transforms/Vectorize/LoopVectorize.cpp | ||
---|---|---|
3308 | Remove the first assert by turning dyn_cast<> into cast<>. Add a message to the second assert. | |
3311 | Add a message to the assert. | |
3322 | ditto | |
3324 | check clang-format indentation. | |
test/CodeGen/ARM/loopvectorize_pr33804.ll | ||
18 | Best force vectorization width to 8 if that's the VF we expect; or expect a vector store of any width. | |
39 | Explain how the second test differs from the first; i.e., the first requires casting the float value to be stored, into a pointer, and the second requires casting the pointer value into a float. | |
64 | What about tests for float<->pointer conversions on interleaved groups of loads? |
My last comments are minor; I'm also ok with the patch after they are addressed.
Another final comment is to also have a test that compiles to 64 bits having a struct with double and pointer fields.
- Added messages to asserts.
- Fixed some indentation issues.
- Added test case for double <-> pointer for AArch64. I could not reproduce this on x86_64.
- Added more comments in unit tests to make it more clear.
test/CodeGen/ARM/loopvectorize_pr33804.ll | ||
---|---|---|
18 | Changed it to expect a vector store of any width since the concern is vectorizer should not crash, not the vectorized size. | |
64 | Because of my limited knowledge of how loop vectorizer works, I am not yet able to create a test case for loads which triggers float <-> pointer casting. Will try to create a test case for loads in a follow up commit. |
test/CodeGen/ARM/loopvectorize_pr33804.ll | ||
---|---|---|
64 | You can simply first load the two fields, say fadd 0.5 to one and do pointer++ using gep to the other, then store them back; or even just copy them from one array to another. |
test/CodeGen/ARM/loopvectorize_pr33804.ll | ||
---|---|---|
64 | Tried that but still no luck. Storing back to same addresses or even to some globals inside/outside the loop didn't help. |
This was closed due to committing r312331, right? Code LGTM, for the record. Tests for interleaved loads of float/pointer should still be added, as this patch presumably handles them too.
NIT: I think you should keep the createBitOrPointerCast name, to make it clear.