This is an archive of the discontinued LLVM Phabricator instance.

[X86] replace vinsertf128 intrinsics with generic shuffles
ClosedPublic

Authored by spatel on Mar 5 2015, 12:30 PM.

Details

Summary

This is the sibling patch for the LLVM half of this change:
http://reviews.llvm.org/D8086

We want to remove as much custom x86 shuffling nonsense as possible.

For the avxintrin.h header, it's not clear to me how we decide between macro implementations and inline function implementations. I thought macros were generally frowned upon, so I went with inlines.

Diff Detail

Event Timeline

spatel updated this revision to Diff 21300.Mar 5 2015, 12:30 PM
spatel retitled this revision from to [X86] replace vinsertf128 intrinsics with generic shuffles.
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added a subscriber: Unknown Object (MLST).
craig.topper edited edge metadata.Mar 5 2015, 12:44 PM

Macros are needed when clang's integer constant evaluation is being used to create the inputs to the vector shuffle.

I believe with the inline version you've used the IR will contain two shuffles and the optimizers in the backend will have to reduce it.

I think I'd prefer the macro approach and only one shuffle with math on the immediates.

spatel updated this revision to Diff 21319.Mar 5 2015, 3:24 PM
spatel edited edge metadata.

Ah - I had looked at a problem where we accept a variable when only a constant is acceptable:
http://llvm.org/bugs/show_bug.cgi?id=21713

Not sure if that's still a bug.

In any case, patch updated:

  1. Use macros.
  2. Shuffle mask elements generated individually via selects.
  3. More test cases added to confirm the that the masks are good.

Thanks for working on this Sanjay!

lib/Headers/avxintrin.h
1160

The use of macros leaves these open to Wshadow warnings - what is the effect of casting them directly in the __builtin_shufflevector args?

1173

Ditto Wshadow

1182

Ditto Wshadow

1188

Similar to D8086 - shouldn't the immediate value be masked to just use the lsb? ((M)&1)

spatel updated this revision to Diff 21349.Mar 6 2015, 7:00 AM

Updated to ignore upper bits of the immediate input.

spatel added a comment.Mar 6 2015, 7:10 AM

The use of macros leaves these open to Wshadow warnings -
what is the effect of casting them directly in the __builtin_shufflevector args?

I admit that I have no clue - I just copy pasted from the existing macros in this file.

I wasn't sure why those names had double-leading-underscores, but now that you mention shadows, I assume it's precisely to avoid that problem.

As an experiment, I removed the temp values by casting the inputs directly into the shuffles, and I don't see any warnings.

But I think if we want to make this change, then we should make it for all of the macros in this file in a follow-on commit. That way the code will be consistently implemented, and if we've missed some nuance of macro-fication, we can revert easily.

The use of macros leaves these open to Wshadow warnings -
what is the effect of casting them directly in the __builtin_shufflevector args?

I admit that I have no clue - I just copy pasted from the existing macros in this file.

I wasn't sure why those names had double-leading-underscores, but now that you mention shadows, I assume it's precisely to avoid that problem.

The double-leading-underscores are not used to avoid the warnings. If you remove the extra underscores, you would still get the [-Wshadow] warnings.
In my experiments, you get those warnings only if you explicitly pass flags -Wshadow and -save-temps. Alternatively, you would only need to pass -Wshadow on a preprocessed file.

As an experiment, I removed the temp values by casting the inputs directly into the shuffles, and I don't see any warnings.

But I think if we want to make this change, then we should make it for all of the macros in this file in a follow-on commit. That way the code will be consistently implemented, and if we've missed some nuance of macro-fication, we can revert easily.

spatel added a comment.Mar 6 2015, 8:27 AM

The double-leading-underscores are not used to avoid the warnings. If you remove the extra underscores, you would still get the [-Wshadow] warnings.

I may be misunderstanding, but I thought the double-leading-underscores were used to avoid interacting with the user's namespace. Double-leading-underscore is reserved for the compiler only?

I believe the Wshadow warnings occur when you pass a call to one of the macros as an argument to another macro. This causes the same variable name to occur in the inner and outer macro expansion. This occurs regardless of any underscores.

I'm guessing it requires save-temps to expose it because clang is probably smart enough to suppress the warning when macro expansion and compiling are done together. But with save-temps, compiling happens on the file written out by the preprocessor so clang doesn't know about the macro anymore.

This can prevented if you don't create intermediate variables inside the macro.

spatel updated this revision to Diff 21493.Mar 9 2015, 9:41 AM

Updated patch to not use intermediate variables in the new macros.

Thanks everyone for the explanation. I must be missing a step because I was not seeing any warnings using a preprocessed file and -Wshadow. Please let me know if this version solves the problem.

craig.topper accepted this revision.Mar 9 2015, 8:22 PM
craig.topper edited edge metadata.

LGTM

This revision is now accepted and ready to land.Mar 9 2015, 8:22 PM
This revision was automatically updated to reflect the committed changes.

Thanks, Craig! Checked in at r231792.