This is an archive of the discontinued LLVM Phabricator instance.

fix invalid load folding with SSE/AVX FP logical instructions (PR22371)
ClosedPublic

Authored by spatel on Jul 23 2015, 3:11 PM.

Details

Summary

This is a follow-up to the FIXME that was added with D7474 ( http://reviews.llvm.org/rL229531 ).
I thought this load folding bug had been made hard-to-hit, but it turns out to be very easy when targeting 32-bit x86 and causes a miscompile/crash in Wine:
https://bugs.winehq.org/show_bug.cgi?id=38826
https://llvm.org/bugs/show_bug.cgi?id=22371#c25

The quick fix is to simply remove the scalar FP logical instructions from the load folding table in X86InstrInfo, but that causes us to miss load folds that should be possible when lowering fabs, fneg, fcopysign. So the majority of this patch is altering those lowerings to use *vector* FP logical instructions (because that's all x86 gives us anyway). That lets us do the load folding legally.

The test case for PR2656 was actually checking for miscompiled code, so I changed that. I added the latest test case from PR22371 for extra verification. The changes in sse-fcopysign.ll look benign to me; just different scheduling / RA. I'm not sure why we had 'vandps' and now have 'vandpd' in vec_fabs.ll, but those are logically identical.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 30525.Jul 23 2015, 3:11 PM
spatel retitled this revision from to fix invalid load folding with SSE/AVX FP logical instructions (PR22371).
spatel updated this object.
spatel added reviewers: chandlerc, qcolombet, hfinkel.
spatel added a subscriber: llvm-commits.
hans added a subscriber: hans.Jul 23 2015, 5:05 PM
hans added a comment.Jul 27 2015, 1:23 PM

Is anyone reviewing this? It would be nice to get this PR fixed for 3.7.

chandlerc accepted this revision.Jul 27 2015, 3:17 PM
chandlerc edited edge metadata.

LGTM, and looks good for the branch as well.

test/CodeGen/X86/pr2656.ll
9–10 ↗(On Diff #30525)

I'd specifically call out that we *can* do a 16-byte constant pool load for the xorps mask used to negate these values, it just isn't folded because it is used twice. Otherwise it's a bit confusing to read the comment followed by this particular example.

This revision is now accepted and ready to land.Jul 27 2015, 3:17 PM
spatel added inline comments.Jul 27 2015, 5:37 PM
test/CodeGen/X86/pr2656.ll
9–10 ↗(On Diff #30525)

Yes, that is confusing on 2nd look. I'll fix that and get this checked in.

Thanks for the prompt review!

This revision was automatically updated to reflect the committed changes.
hans added a comment.Jul 28 2015, 9:21 AM

Merged to 3.7 in r243435.