This is an archive of the discontinued LLVM Phabricator instance.

[AVX] Lower / fast-isel scalar FP selects into VBLENDV instructions (PR22483)
ClosedPublic

Authored by spatel on Mar 4 2015, 11:01 AM.

Details

Summary

This patch reduces code size for all AVX targets and increases speed for some chips.

SSE 4.1 introduced the useless (see code comments) 2-register form of BLENDV and only in the "packed" float/double flavors. Scalar alias mnemonics would have cost so much...paper. But they distinguished between floats and doubles, so we should be thankful. Wait...

AVX subsequently made the instruction useful by adding a 4-register operand form.

So we just need to paper over the lack of scalar forms of this instruction, complicate the code to choose float or double forms, and use blendv on scalars since all FP is in xmm registers anyway.

This gives us an approximately 50% speed up for a blendv microbenchmark sequence on SandyBridge and Haswell:
blendv : 29.73 cycles/iter
logic : 43.15 cycles/iter

I'm not adding any new test cases because:

  1. fast-isel-select-sse.ll tests the positive side for regular X86 lowering and fast-isel
  2. sse-minmax.ll and fp-select-cmp-and.ll confirm that we're not firing for scalar selects without AVX
  3. fp-select-cmp-and.ll and logical-load-fold.ll confirm that we're not firing for scalar selects with constants.

http://llvm.org/bugs/show_bug.cgi?id=22483

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 21215.Mar 4 2015, 11:01 AM
spatel retitled this revision from to [AVX] Lower / fast-isel scalar FP selects into VBLENDV instructions (PR22483).
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: chandlerc, qcolombet, mkuper.
spatel added a subscriber: Unknown Object (MLST).
qcolombet accepted this revision.Mar 5 2015, 10:20 AM
qcolombet edited edge metadata.

Hi Sanjay,

LGTM with one comment: could you make two different commits:

  • One for the fast-isel part.
  • One for the selection dag part.

Thanks,
-Quentin

This revision is now accepted and ready to land.Mar 5 2015, 10:20 AM

Hi Quentin -

Thanks for the prompt review!

I considered splitting this into 2 pieces as you suggested, but we would temporarily have a mess in test/CodeGen/X86/fast-isel-select-sse.ll. Instead of having a single set of CHECK lines, we'd have the old codegen for one case and the new codegen for the other case. This file would then have to get updated to the state you see in this patch upon the second commit. Do you see a way to avoid that?

Ah right.

Don't bother with that.
I do not think this is worth the extra work.

Thanks,
Q.

This revision was automatically updated to reflect the committed changes.