Page MenuHomePhabricator

Code Generator for Gather and Scatter Intrinsics.
ClosedPublic

Authored by delena on Feb 16 2015, 2:25 AM.

Details

Summary

I'm asking to review the CodeGen part of masked gather and scatter. I know that the patch is big, I just don't see how to split it.
Just to remind the syntax:

<4 x double> @masked.gather.v4f64 (<4 x double*> %Ptrs, i32 8, <4 x i1> %Mask, <4 x double> %PassTru)
void @masked.scatter.v4f64 (<4 x double> %Val, <4 x double*> %Ptrs, i32 8, <4 x i1> %Mask)

In the target independent part - I build MGATHER and MSCATTER nodes and legalize vector types.

X86 specific:

  • the both X86 gather and scatter instructions kill mask register (implicitly)
  • the memory operand is in base-index-scale form, where the "index" is a vector of indices

Loop vectorizer, cost model, target transform changes are next in the pipe.

Diff Detail

Event Timeline

delena updated this revision to Diff 20007.Feb 16 2015, 2:25 AM
delena retitled this revision from to Code Generator Patterns for X86 Gather and Scatter.
delena updated this object.
delena edited the test plan for this revision. (Show Details)
delena added reviewers: anemet, craig.topper.
delena set the repository for this revision to rL LLVM.
delena added a subscriber: Unknown Object (MLST).
craig.topper added inline comments.Mar 2 2015, 9:50 PM
../include/llvm/Target/TargetSelectionDAG.td
199 ↗(On Diff #20007)

Why isn't this SDTypeProfile<2. 3 ?

../lib/Target/X86/X86InstrFragmentsSIMD.td
294 ↗(On Diff #20007)

This doesn't appear to be used.

499 ↗(On Diff #20007)

Why so much commented out code?

../lib/Target/X86/X86InstrInfo.td
723 ↗(On Diff #20007)

Why is this commented out?

../utils/TableGen/CodeGenDAGPatterns.cpp
1623 ↗(On Diff #20007)

Shouldn't this be "MadeChange |=" as it is it overwrites the change from the first call.

1624 ↗(On Diff #20007)

Remove space before the --.

delena added inline comments.Mar 3 2015, 12:43 AM
../include/llvm/Target/TargetSelectionDAG.td
199 ↗(On Diff #20007)

I tie the second destination to the one of the sources and create an SDNode node with 3 sources and one destination. This way it works.
Otherwise I need to rewrite a half of the tablegen code.

../lib/Target/X86/X86InstrFragmentsSIMD.td
294 ↗(On Diff #20007)

You are right, I'll remove it, thanks.

499 ↗(On Diff #20007)

This is the real code that will go in. I have the full implementation of the feature including tests, and trying to split it now into small patches for review.

../lib/Target/X86/X86InstrInfo.td
723 ↗(On Diff #20007)

I don't want to put implementation of SelectVectorAddr() in this patch. I just show that the vectoraddr is a special complex pattern that has its own implementation.

../utils/TableGen/CodeGenDAGPatterns.cpp
1623 ↗(On Diff #20007)

You are right, thanks.

1624 ↗(On Diff #20007)

Ok.

hfinkel added inline comments.Mar 4 2015, 10:04 PM
../include/llvm/CodeGen/ISDOpcodes.h
686 ↗(On Diff #20007)

For these nodes (and also the MLOAD/MSTORE nodes above), we need comments explaining what the operands are.

../lib/Target/X86/X86InstrInfo.td
723 ↗(On Diff #20007)

Okay, then the comment should say that.

delena updated this revision to Diff 22547.Mar 24 2015, 2:56 AM
delena retitled this revision from Code Generator Patterns for X86 Gather and Scatter to Code Generator for Gather and Scatter Intrinsics..
delena updated this object.
delena edited the test plan for this revision. (Show Details)
delena added reviewers: HaoLiu, rengolin.

Craig Topper submitted TableGen changes that allow to specify the "gather" as a node with two destination registers.

craig.topper added inline comments.Mar 26 2015, 11:41 PM
../include/llvm/Target/TargetSelectionDAG.td
200 ↗(On Diff #20007)

I think we should add SDTCisPtrTy for the last argument to both scatter and gather?

../lib/Target/X86/X86InstrFragmentsSIMD.td
499 ↗(On Diff #20007)

Oh was this commented out to simplify review?

include/llvm/CodeGen/SelectionDAG.h
876 ↗(On Diff #22547)

Second line isn't quite aligned right. Same with the line below.

include/llvm/CodeGen/SelectionDAGNodes.h
2038 ↗(On Diff #22547)

No need for parentheses

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
5214

Please add a blank line between this function and the previous function.

5269

Is there a reason this is commented out instead of just removed?

lib/Target/X86/X86ISelLowering.cpp
1392

No space before EltSize.

delena updated this revision to Diff 22852.Mar 29 2015, 5:49 AM

Hi Craig,
Thank you for review.

I addressed all comments, and uploaded a new diff.

delena updated this revision to Diff 22853.Mar 29 2015, 6:27 AM

Ping..

Thanks.

rengolin edited edge metadata.Apr 16 2015, 12:45 PM

Hi Elena,

<4 x double> @masked.gather.v4f64 (<4 x double*> %Ptrs, i32 8, <4 x i1> %Mask, <4 x double> %PassTru)
void @masked.scatter.v4f64 (<4 x double> %Val, <4 x double*> %Ptrs, i32 8, <4 x i1> %Mask)

All new intrinsics need change to the LangRef document, to make sure we don't forget any. It also makes it easier for reviewers to understand the clear semantics in relation to the code. Please update docs/LangRef.rst.

cheers,
--renato

ab added a subscriber: ab.Apr 28 2015, 10:48 AM

Skimming through, the logic seems sound; a few nits (and a scary missing break).

Thanks!
-Ahmed

include/llvm/CodeGen/SelectionDAGNodes.h
1988 ↗(On Diff #22853)

One "is" too many.

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
1358

missing break?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
3724–3727 ↗(On Diff #22853)

{ } for the "if" as well, perhaps?

3775 ↗(On Diff #22853)

nullptr

3783–3786 ↗(On Diff #22853)

Same, { } ?

test/CodeGen/X86/masked_gather_scatter.ll
108–144

Missing CHECKs ?

145–148

Needless space?

test/CodeGen/X86/masked_memop.ll
10–12

This is surprising, do we know why this happens?

delena updated this revision to Diff 24602.Apr 29 2015, 1:58 AM
delena edited edge metadata.
delena removed rL LLVM as the repository for this revision.

I followed Ahmed's review and uploaded a new patch.

rengolin added inline comments.Apr 29 2015, 3:54 AM
test/CodeGen/X86/masked_memop.ll
10–12

I'm guessing the order has changed. Using AVX2-DAG on both and keeping the 32 as before should work, too, and be more explicit.

; AVX2-DAG: vpmaskmovd      32(%rdi)
; AVX2-DAG: vpmaskmovd      (%rdi)

Then, the 32 can come in any order.

ab added inline comments.Apr 30 2015, 7:44 AM
test/CodeGen/X86/masked_memop.ll
10–12

Right, -DAG would work. What I'm saying is: the order change is a symptom - innocuous, yes - of something else - probably innocuous as well. We're just legalizing entirely new nodes and intrinsics. Why would any of the existing codegen ever change?

Elena, if you could spare some time to investigate this, it'd be much appreciated!

The test passes without this change, so it is unrelated to gather/scatter . I just revisited all my tests and thought that setting exact order of these 2 operations in incorrect.

rengolin accepted this revision.Jul 28 2015, 4:21 AM
rengolin edited edge metadata.
This revision is now accepted and ready to land.Jul 28 2015, 4:21 AM
rengolin closed this revision.Jul 28 2015, 4:21 AM

r236211