This is an archive of the discontinued LLVM Phabricator instance.

[x86, SSE] change patterns for CMPP to float types to allow matching with SSE1 (PR28044)
ClosedPublic

Authored by spatel on Jun 10 2016, 11:57 AM.

Details

Summary

This patch is intended to solve:
https://llvm.org/bugs/show_bug.cgi?id=28044

By changing the definition of X86ISD::CMPP to use float types, we allow it to be created and pass legalization for an SSE1-only target where v4i32 is not legal.

The motivational trail for this change includes:
https://llvm.org/bugs/show_bug.cgi?id=28001

and eventually makes this trigger:
http://reviews.llvm.org/D21190

Ie, after this step, we should be free to have Clang generate FP compare IR instead of x86 intrinsics for SSE C packed compare intrinsics. (We can auto-upgrade and remove the LLVM sse.cmp intrinsics as a follow-up step.) Once we're generating vector IR instead of x86 intrinsics, a big pile of generic optimizations can trigger.

Diff Detail

Event Timeline

spatel updated this revision to Diff 60380.Jun 10 2016, 11:57 AM
spatel retitled this revision from to [x86, SSE] change patterns for CMPP to float types to allow matching with SSE1 (PR28044).
spatel updated this object.
spatel added reviewers: ab, RKSimon, craig.topper.
spatel added a subscriber: llvm-commits.
RKSimon edited edge metadata.Jun 10 2016, 1:27 PM

Is already a number of SSE1/SSE2+ test cases covering all the float comparisons? All I know of is vector-compare-results.ll

lib/Target/X86/X86ISelLowering.cpp
15150–15152

Please can you place comments near each SSECC hardcoded constant explaining what they are?

Is already a number of SSE1/SSE2+ test cases covering all the float comparisons? All I know of is vector-compare-results.ll

I don't think there's an exhaustive test for each possible fcmp predicate in IR, but most of them are covered in commute-fcmp.ll after:
http://reviews.llvm.org/rL272397

Also, I have a patch in progress for the intrinsic reduction that gets enabled by this, and that will change:
sse-intrinsics-fast-isel.ll
sse2-intrinsics-fast-isel.ll
to IR fcmp tests rather than llvm.x86.sse[2].cmp.[ps/pd], so we should have complete coverage after that.

spatel updated this revision to Diff 60404.Jun 10 2016, 2:35 PM
spatel edited edge metadata.

Patch updated:

  1. Add comments to explain magic constant values.
  2. Add TODO comment for potential Intel AVX optimization (not sure how software is supposed to detect this difference between AMD and Intel).
RKSimon accepted this revision.Jun 11 2016, 5:39 AM
RKSimon edited edge metadata.

LGTM

This revision is now accepted and ready to land.Jun 11 2016, 5:39 AM
This revision was automatically updated to reflect the committed changes.