This is an archive of the discontinued LLVM Phabricator instance.

fix for PR20354 - Miscompile of fabs due to vectorization
AbandonedPublic

Authored by spatel on Jul 22 2014, 3:07 PM.

Details

Summary

In PR20354 ( http://llvm.org/bugs/show_bug.cgi?id=20354 ), we're miscompiling a vector fabs operation.

This patch corrects that case and allows optimization of vector fabs ops via sign bit twiddling rather than using FP instructions. It also changes the logic in visitFNEG to allow vector fneg ops to be optimized in a similar way.

The fabs and fneg cases are similar enough that we should probably refactor the code to reduce duplication, but I don't want to add that complication to this patch.

This patch breaks an existing ARM testcase in test/CodeGen/ARM/2009-10-21-InvalidFNeg.ll. That test was expecting use of VFPU/NEON, but now we don't even need to touch the FPU. I think that's universally better for any ARM target?

I've added testcases to the existing X86 tests for vec_fabs and vec_neg. Also added a FIXME to the existing tests in vec_fneg.ll because they don't have any checks.

Diff Detail

Event Timeline

spatel updated this revision to Diff 11787.Jul 22 2014, 3:07 PM
spatel retitled this revision from to fix for PR20354 - Miscompile of fabs due to vectorization.
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: rengolin, chandlerc, nadav.
spatel added a subscriber: Unknown Object (MLST).
rengolin added inline comments.Jul 28 2014, 9:28 AM
test/CodeGen/ARM/2009-10-21-InvalidFNeg.ll
14

I'm not sure what was the purpose of this test, but you can't just assume that it's ok. Can you share the resulting code?

Currently, it produces this:

add	r1, sp, #36
add	r0, r0, #48
vld1.32	{d16[0]}, [r1:32]
add	r1, r1, #4
vld1.32	{d16[1]}, [r1:32]
add	r1, sp, #44
vld1.32	{d17[0]}, [r1:32]
add	r1, r1, #4
vld1.32	{d17[1]}, [r1:32]
vneg.f32	q8, q8
vst1.64	{d16, d17}, [r0:128]
bx	lr

Which I agree, doesn't look like a clear winner, but it might be a side effect of the original intention...

chandlerc edited edge metadata.Jul 28 2014, 1:06 PM

Pinging this vigorously as it is a miscompile fix that has gone without
review for a week... Adding more folks.

pete added a subscriber: pete.Jul 28 2014, 10:34 PM

This optimization itself looks useful, and the code is good, but surely this is hiding a bug? If so, the optimization should be committed later, once we know the real cause of the bug.

You're causing more code to hit the DAG combine here, i.e., the vector case will now do so. But to have had a miscompile the code must have been missing this combine, and hitting something later which is going wrong.

Is there another DAG combine, or even part of legalization which is wrong? I'd expect it to be common code if both ARM and X86 exhibit the same bug.

Hi Pete and Renato -

Thank you very much for the feedback. I won't be back at an LLVM dev machine until next week, so I can't generate the new code for the ARM test at the moment.

The cause of the bug for fabs in PR20354 is that we are incorrectly guarding against vector operands, so we produce a bit mask where only the *first* high bit of a vector was masked off: 0x7fffffffffffffff. We need to mask off each vector element's high bit: 0x7fffffff7fffffff. I'll try to make this clearer by adding some comments.

I think we can just fix the bug in FABS by changing the 'if' check in visitFABS to match what already exists in visitFNEG. Ie, this part of the check:
!N0.getOperand(0).getValueType().isVector()

would become:
!VT.isVector()

Of course, this will not generate the optimal code for vectors, but it should avoid the miscompilation.

pete added a comment.Jul 29 2014, 8:51 AM

Hi Sanjay

Oh yeah, I see the fix you've made for the mask, and i totally agree that fix is correct, and required.

My point is that because of !VT.isVector(), we shouldn't even be hitting the code inside that branch right now. So if you're getting miscompilations, its due to bad code elsewhere.

Or is the point that you're only getting miscompilations *after* you remove !VT.isVector(), and thats what you're fixing?

Thanks.

spatel abandoned this revision.Aug 3 2014, 3:16 PM

Hi Pete -
We're getting a miscompilation from the existing / unchanged code for visitFABS. I've tried to combine a bug fix and two optimizations into this single patch, and I think that was a mistake because that's causing confusion.

Let me abandon this patch, submit a new patch that *only* fixes the bug for FABS, follow that up with a patch that optimizes the vector case for FABS, and finally submit a final patch that optimizes the vector case for FNEG.

pete added a comment.Aug 3 2014, 3:45 PM

Thanks for the clarification and for spinning off the other work in to its own bug.

The remainder in this bug LGTM given the fix in the other patch.

Thanks,
Pete

rengolin edited edge metadata.Aug 3 2014, 11:33 PM

I'm still worried about the ARM changes. Not that it looks wrong, just that the test doesn't tell me the whole picture. Can you share the resulting asm, please?

--renato

Hi Renato -

Certainly, I will put the ARM code change into the forthcoming patch for FNEG and cc you. Ok to do that in the new patch rather than putting anything more in this abandoned patch?

I have the FABS optimization patch proposal ready right now, so I will add you as a reviewer on that too. I hope the extra comments that I've included there will make it more obvious how the code is expected to change.

Sure, I only realised it was abandoned after I commented... ;)

--renato