This is an archive of the discontinued LLVM Phabricator instance.

Generation of PSAD in LoopVectorizer
Needs ReviewPublic

Authored by Vijender on Mar 7 2015, 1:27 AM.

Details

Summary

This patch corresponds to the discussion we had in the following link about generation of psad in loop vectorizer
http://permalink.gmane.org/gmane.comp.compilers.llvm.devel/81724

Couple of structures were added to make it generic so that other patterns in the future can be easily integrated.

Diff Detail

Repository
rL LLVM

Event Timeline

Vijender updated this revision to Diff 21419.Mar 7 2015, 1:27 AM
Vijender retitled this revision from to Generation of PSAD in LoopVectorizer.
Vijender updated this object.
Vijender edited the test plan for this revision. (Show Details)
Vijender added reviewers: hfinkel, aschwaighofer, spatel.
Vijender added a subscriber: Unknown Object (MLST).
ab added a subscriber: ab.Mar 7 2015, 4:24 PM
hfinkel edited edge metadata.Mar 9 2015, 10:36 AM

This patch needs regression tests, for for the vectorizer itself (in test/Transforms/LoopVectorize/X86) and CodeGen tests for the new intrinsic in test/CodeGen/X86.

Also, you need generic lowering support in LegalizeDAG for targets that don't support directly lowering the ISD node (and you should make sure that it is set by default to Expand in TargetLoweringBase::initActions).

include/llvm/Analysis/TargetTransformInfo.h
453

If this function applies only to ISDs, then it should not be here. This header only contains functions directly useful to IR-level passes. You can add this as a callback in the BasicTTI implementation, specialized by the targets, in order to share code.

454

Indenting is off.

573

Indenting is off.

729

Indenting is off.

include/llvm/CodeGen/ISDOpcodes.h
647

Please write out what SAD stands for. Please also say something more about the semantics, and any constraints on the relationship between the scalar type and the input vector element type?

include/llvm/IR/Intrinsics.td
584

Replace this comment with:

// Calculate the Sum of Absolute Differences (SAD) of the two input vectors.

Also, I assume only integer vectors are supported, in which case, please say so.

(note that our IR-level intrinsics are defined only by their semantics, and we never guarantee to produce any specific instruction)

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1291 ↗(On Diff #21419)

I don't understand this comment. Doesn't the beginning of this function assert the type legality of all node operands and return values (except for TargetConstants)?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4423 ↗(On Diff #21419)

Why v2i64 for v16i8?

4426 ↗(On Diff #21419)

{ should be on the previous line.

4438 ↗(On Diff #21419)

else should be on the line with the } (per our coding conventions)

4441 ↗(On Diff #21419)

Shouldn't be i32, should the the type from the input intrinsic.

lib/Transforms/Vectorize/LoopVectorize.cpp
630

How about:

// minVF - Specifies the minimum VF for which the target supports this pattern. For example, for x86 the minimum VF for SAD is 8.
2683

cv -> CV

2808

Remove commented-out code.

4251

two -> Two

4358

Line too long.

4548

Widest == Smallest?

5023

Line too long.

Vijender updated this revision to Diff 23038.Apr 1 2015, 5:12 AM
Vijender edited edge metadata.
Vijender set the repository for this revision to rL LLVM.

Hello All,

The Lowering code of SAD is separated from this patch. This patch contains changes in LoopVectorizer and cost modeling. The Lowering code with come as a separate patch. All the previous comments in LoopVectorizer are taken care, Please review the LoopVectorize code.

Thank you,
Vijender

jmolloy added a subscriber: jmolloy.Apr 1 2015, 5:43 AM

Hi Vijender,

Thanks for doing this! I have a bunch of comments below. High-level, why are you lowering this as an intrinsic rather than an IR-level pattern? I ask because we really do want this pattern recognized in straight-line (non vectorized) code too, and that is something the DAGCombiner could do.

Cheers,

James

include/llvm/Analysis/TargetTransformInfo.h
455

Please spell out "SAD" = "Sum of absolute differences" = "C += abs(A - B)" in the docstring, so it is clear (not all backends will use the same mnemonic).

include/llvm/CodeGen/ISDOpcodes.h
647

"of char type"? Do you mean i8? Why only i8? Why is the (scalar part of the) input type not the same as the output type?

include/llvm/IR/Intrinsics.td
596

Char? Why i32? why not i64? why not i8/i16?

597

What is the signedness of this operation? Is there a floating point equivalent? If not, why not?

lib/Transforms/Vectorize/LoopVectorize.cpp
630

Just use "minVF" here, it won't conflict as long as you reference it in the initializer list.

642

I don't like the assumption here that the legality of an operation is a range [min,max]. I don't think we should assume continuity in the range. Perhaps this should be a bitset of supported types instead?

767

Shouldn't this be a multimap? I can imagine that there might be complex patterns and simple patterns, and a complex pattern may subsume a simple pattern. So an instruction could be part of two patterns (until one is selected).

1014

Uh, surely this needs to be just the type of the phi? If smaller types are legal, the phi should have a smaller type, right?

Your proposed algorithm falls down in this case:

int32_t a;
int8_t b;
int32_t *c;
for (...) {
  a += abs(c[i] - c[i+1]);
  b += (int8_t)c[i];
}

Oh dear, you'll select "i8" as your type but that is illegal (it's not part of this PHI).

1050

I generally don't like the use of "Special". Can we not think of a more descriptive name for it, like "Pattern"?

2556

Well this isn't right - Who says the identity for all pattern based reductions is integer zero?

2582

If it assumed that all patterns can be added together, that isn't documented anywhere.

3014

I don't like this at all. You've introduced a new generic pattern type then binned that and special-cased for SAD. Why is SAD special? why is it different from ADD for this case?

3295

Args

3312

No default case?

4862

I think here you're special casing the SpecialPattern stuff when really it shouldn't be special-cased - this is a problem (finding the smallest valid type for an operation) that applies in many cases to most operators, not just your patterns.

I think it gives the biggest impact in ARM/MIPS world though as i8 and i16 aren't legal scalar types for us.

Vijender added a comment.EditedApr 5 2015, 10:16 PM

Hello James,

Thank you for your reply. My comments are below.

Thanks for doing this! I have a bunch of comments below. High-level, why are >you lowering this as an intrinsic rather than an IR-level pattern? I ask because >we really do want this pattern recognized in straight-line (non vectorized) code >too, and that is something the DAGCombiner could do.

[VJ] I got your concern and I do accept your point. But if I handle this in DAGCombiner there will be too many patterns to handle because there will be multiple paths reaching the DAGCombiner. For example if LV is enabled and VF is selected as 4, then the pattern contains instructions expanded to V4. Similar is the case with VF =8. If Vectorizer is disabled then the pattern will be scalar pattern. So basically we want to solve it in divide and conquer approach. Right now we have handled in LV and a serious effort is done by Shahid to handle it in SLP (because there will be cases where LV cannot handle due to loop unrolling and other reasons). Now the only remaining part will be non vectorize path which can be handled in DAGCombiner as you mentioned which will be the future work.

[VJ] The reason for replacing the pattern with an intrinsic is that we don’t want that pattern to be disturbed by other optimizations.

  • Vijender

Thank you James. Comments below.

include/llvm/CodeGen/ISDOpcodes.h
647

[VJ] - The Sum of absolute difference instruction is only appropriate for char Types. It is a byte level reduction operation. If you have a look at PSAD in X86 or USAD in ARM, all happen at byte level.
[VJ] - The scalar type of output is not same as scalar type of input because sum of eight i8's may not fit properly in an i8 without data lose.

include/llvm/IR/Intrinsics.td
596

[VJ] - I accept your point. We can make it to return any type.

597

[VJ] - There is no floating point equivalent of SAD.

lib/Transforms/Vectorize/LoopVectorize.cpp
642

[VJ] - Can you please explain how to solve this issue. Are you telling to have supported types of the instruction in a list and check that instead of checking the VF factor?

767

[VJ] - Ok. How about checking the instruction whether it belongs to any other pattern before inserting in the ActionList. Because I think it will not make sense to keep that instruction in both the patterns. Rather we can select which pattern to associate that instruction before inserting. What do you think?

1014

[VJ] Actually this value is required to select the maximum VF factor to try. So lower the value the higher the vectorization factors I can try.

1050

[VJ] - This "Special" word was suggested in the previous review comments in RFC.

2556

[VJ] - I got your point. I need to send the Phi node as one of the arguments to this function to know more about the type and return the respective identity. Will work on it.

2582

[VJ] - I got your point. I need to send the Phi node as one of the arguments to this function to know more about the type and return the respective instruction. Will work on it.

3014

[VJ] the code inside the IF block reduces the vector to a single scalar value. Here my SAD intrinsic output is already a scalar. I need to skip this merger block for SAD.

4862

[VJ] - So shall I replace the getWidestType in the code with the getSmallestType so that it can try all the vectorization factors? Anyways cost modeling already handles everything properly so I feel there is no harm in using getSmallestType.

Hi Vj,

I've given some more comments. Sorry for any lag on this review, I'm moving house and it's the Easter break.

Cheers,

James

include/llvm/CodeGen/ISDOpcodes.h
647

Hi,

No they don't, not in ARM at least. We have i8, i16, i32 and i64 variants. Also, we mustn't tie target agnostic intrinsics to a specific target or set of targets. It should work with whatever types make sense to it.

Also, ARM at least does not return a scalar. Its SAD instructions happen elementwise on a vector (see VABA/UABA/SABA). So we'd need support for this too. Does X86 have this?

include/llvm/IR/Intrinsics.td
597

Not in X86, but why not in LLVM? LLVM's target agnostic layer is not X86. Does SAD not make sense on FP types?

Also, what about signedness?

congh added a subscriber: congh.Sep 1 2015, 11:13 AM

What is the status of this patch? Is there any plan to move it forward?

In D8136#237540, @congh wrote:

What is the status of this patch? Is there any plan to move it forward?

Hi Congh,

Yes, there is a plan for this.

This work depends on the two llvm intrinsics, llvm@*absdiff and llvm@*hsum. Two patches, http://reviews.llvm.org/D11678
and http://reviews.llvm.org/D10964, related to these intrinsics are under review and almost waiting for review clearance. Once those are through we will update this patch accordingly.

Regards,
Shahid

congh added a comment.Sep 1 2015, 11:00 PM
In D8136#237540, @congh wrote:

What is the status of this patch? Is there any plan to move it forward?

Hi Congh,

Yes, there is a plan for this.

This work depends on the two llvm intrinsics, llvm@*absdiff and llvm@*hsum. Two patches, http://reviews.llvm.org/D11678
and http://reviews.llvm.org/D10964, related to these intrinsics are under review and almost waiting for review clearance. Once those are through we will update this patch accordingly.

Regards,
Shahid

Thank you very much for the information, Shahid! I am looking forward to the check-in of those patches.

congh added a comment.Oct 28 2015, 4:25 PM

Based on the discussion and link above, it is not clear to me how PSAD can be implemented through ABSDIFF and HSUM. Take doing PSAD on 2 x v16i8 -> v?i32 for example: currently v16i8 will be widened before DIFF and ABS, so is ABSDIFF for the widened type (v16i16 or v16i32) or for v16i8 here? How HSUM is used to represent PSAD?

spatel resigned from this revision.Sep 26 2017, 3:49 PM