Page MenuHomePhabricator

[X86] replace vinsertf128 intrinsics with generic shuffles
ClosedPublic

Authored by spatel on Mar 5 2015, 12:14 PM.

Details

Summary

This is my first hack at getting us out of the custom x86 shuffle intrinsic business.
This was suggested in: http://reviews.llvm.org/D7866

Let's start with the vinsertf128 troika. I'll post a clang sibling patch shortly.

Please let me know if I've missed anything. I'm basing these changes on:
http://llvm.org/viewvc/llvm-project?view=revision&revision=230860

If anyone can explain why vinsertf128 with a 0 immediate exists, I'm curious...

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 21295.Mar 5 2015, 12:14 PM
spatel retitled this revision from to [X86] replace vinsertf128 intrinsics with generic shuffles.
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added a subscriber: Unknown Object (MLST).
chandlerc added inline comments.Mar 5 2015, 12:51 PM
lib/IR/AutoUpgrade.cpp
644–649 ↗(On Diff #21295)

I think you should teach the CreateShuffleVector method to also accept a array ref of ints and to use the -1 -> undef, mapping to produce the constants for you. Then you can use std::iota to remove all the loops in this code.

Hi Sanjay,

lib/IR/AutoUpgrade.cpp
639 ↗(On Diff #21295)

What happens if Imm is for example 2 or 4?

According to the Intel documentation: "The high 7 bits of the immediate are ignored". So, only the first bit of 'Imm' has a meaning in this context.
However, your code doesn't clear the upper bits of 'Imm'. So (unless I misread the code) the for loops between lines 665 and 674 would propagate the wrong indices if Imm is an even number bigger than zero.

Can you add a test for the case where Imm is a non-zero even number?

RKSimon added a subscriber: RKSimon.Mar 5 2015, 2:46 PM

If anyone can explain why vinsertf128 with a 0 immediate exists, I'm curious...

IIRC its only real use is for loading xmm into the ymm - using a blend would reference a whole 256-bit vector of memory that might not be valid.

spatel added a comment.Mar 5 2015, 5:18 PM

If anyone can explain why vinsertf128 with a 0 immediate exists, I'm curious...

IIRC its only real use is for loading xmm into the ymm - using a blend would reference a whole 256-bit vector of memory that might not be valid.

Yes, the memop form is different. I see only one regression test with a memop vinsertf128 (stack-folding-fp-avx1.ll), and it has an immediate 1 operand. From what I've seen so far, we're always going to replace the reg/reg form with imm 0 with a blendi after this change. This should be good because vblendps/pd have better perf on every chip that I checked. Will have to see what happens in the load fold case.

spatel added inline comments.Mar 5 2015, 5:25 PM
lib/IR/AutoUpgrade.cpp
639 ↗(On Diff #21295)

Yes, that would be a bug. We need to ignore the high bits.

The software intrinsic doesn't match the hardware either; it shows an 'int' rather than a 'char'.

spatel added inline comments.Mar 6 2015, 9:15 AM
lib/IR/AutoUpgrade.cpp
644–649 ↗(On Diff #21295)

I had no idea about std::iota. This would apparently be the first use of it anywhere in LLVM!

But if using it is an NFC, do you mind if I do that in a follow-on patch?

Although this raises another question that I had: what is the life expectancy of the AutoUpgrade code? Once we turn off the tap to create these intrinsics in clang, I assume this will be an even colder code path than it already is.

Do we make any guarantees on backward support for deprecated intrinsics? Could we delete this after the next release?

spatel updated this revision to Diff 21371.Mar 6 2015, 10:25 AM

Patch updated:

  1. Mask off all but the low bit of the immediate
  2. Added test case to confirm masking
craig.topper edited edge metadata.Mar 7 2015, 3:20 PM

I believe I was previously told that we had to keep auto-upgrade support for everything from 3.0 onwards (see comment from Chris in r154778 back in April 2012). Maybe they can go away at 4.0?

I believe I was previously told that we had to keep auto-upgrade support for everything from 3.0 onwards (see comment from Chris in r154778 back in April 2012). Maybe they can go away at 4.0?

I thought 4.1 (so 4.0 can read all 3.x input).

I believe I was previously told that we had to keep auto-upgrade support for everything from 3.0 onwards (see comment from Chris in r154778 back in April 2012). Maybe they can go away at 4.0?

I thought 4.1 (so 4.0 can read all 3.x input).

So the plan is for all of the auto-upgrade intrinsics to be dropped at once at 4.1? Would a deprecation warning be useful here (either now or at 4.0) or at least a description in the comments in AutoUpgrade.cpp?

craig.topper accepted this revision.Mar 9 2015, 8:19 PM
craig.topper edited edge metadata.

LGTM

This revision is now accepted and ready to land.Mar 9 2015, 8:19 PM
This revision was automatically updated to reflect the committed changes.

Thanks all for the patch feedback - checked in at r231794 (the Clang half is at r231792).

I didn't add a comment about the expected lifetime of AutoUpgrade code because I wasn't sure what the final answer is. But I agree that we should add a comment once we have consensus. That way anyone looking in that file will have some idea about the policy.