This is an archive of the discontinued LLVM Phabricator instance.

Add Support for Small Size Reductions
ClosedPublic

Authored by mssimpso on Aug 20 2015, 11:42 AM.

Details

Summary

Unlike scalar operations, we can perform vector operations on
element types that are smaller than the native integer types. We
type-promote scalar operations if they are smaller than a native
type (e.g., i8 arithmetic is promoted to i32 arithmetic on Arm
targets). This patch detects and removes type-promotions within the
reduction detection framework, enabling the vectorization of small
size reductions.

In the legality phase, we look through the ANDs and extensions that
InstCombine creates during promotion, keeping track of the smaller
type. In the profitability phase, we use the smaller type and
ignore the ANDs and extensions in the cost model. Finally, in the
code generation phase, we truncate the result of the reduction to
allow InstCombine to rewrite the entire expression in the smaller
type.

This fixes PR21369.

Diff Detail

Event Timeline

mssimpso updated this revision to Diff 32717.Aug 20 2015, 11:42 AM
mssimpso retitled this revision from to Add Support for Small Size Reductions.
mssimpso updated this object.
mssimpso added reviewers: jmolloy, hfinkel.
mssimpso added subscribers: llvm-commits, mcrosier.
mcrosier added a subscriber: Gerolf.

Just some minor style comments.

lib/Transforms/Utils/LoopUtils.cpp
46

You could simplify this with just a 'return true;'

47

The default case is generally at the top of the switch unless all cases are covered. In that case (i.e., all cases of the enum are covered), the default case should be excluded. :P

http://llvm.org/docs/CodingStandards.html#don-t-use-default-labels-in-fully-covered-switches-over-enumerations

50

return false;

58

Same comments as above.

75

You could reduce indentation by using an early exit:

if (!Phi->hasOneUse())

return Phi;

http://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code

101

The mixing of the variable with a bool makes me slightly concerned. It might just be me, but perhaps we could clean this up with two bools or an enum.

132

This is the part that concerns me with mixing int with bool.

mssimpso updated this revision to Diff 32748.Aug 20 2015, 2:24 PM

Addressed @mcrosier's comments. Thanks, Chad!

mssimpso marked 7 inline comments as done.Aug 20 2015, 2:30 PM
jmolloy edited edge metadata.Aug 24 2015, 12:46 PM

Hi,

I'm really sorry I haven't got to this review yet. I'll look at it tomorrow.

James

No problem, James. Also, feel free to include me on anything you're working on.

Hi,

I really like the improvements to RecurrenceDescriptor (even if it does extend the truly horrible API - someone needs to get around to making that decent).

I can see how this works. I'm just slightly concerned that it's making reductions a special case, and I can't see how this would be extended to support non-reduction values (like load->arithmetic->store).

The latter is most interesting, because we can't just hand them off to InstCombine to sort out, because InstCombine will only look at chains 6 deep.

I have a patch almost ready that does a similar thing but in a different way. My patch uses the DemandedBits analysis and unions the demanded bits for each def/use chain, then determines if the union of demanded bits allows the entire def/use chain to be truncated (high bits unset).

That said the knowledge you've given RecurrenceDescriptor is really good, because we have a very correlated problem in SLPVectorizer when building horizontal reduction chains.

What are your thoughts about how this approach would scale to non-reductions? Perhaps that might decide which way we go?

Cheers,

James

James,

Thanks for the comments. For the non-reduction case, there is no legality issue, so I was assuming we would just ignore the appropriate instructions in the cost model. However, if InstCombine isn't always capable of performing the rewrite, this is a problem. I really like the idea of using the demanded bits analysis. In the ideal case, I think the vectorizer would widen each instruction to the appropriate bit width and VF without depending on InstCombine to rewrite things. I would really like to see your patch when it's ready!

Aside from code generation, there are other similarities with the reduction and non-reduction cases (e.g., telling the cost model to ignore the cast instructions). For the legality of the reductions, we still need to look through the AND instructions, but I think this could probably be made much more general.

What are your thoughts on splitting this work up into three separate components? (1) changes to RecurenceDescriptor for legalizing the reduction, (2) changes to the cost model for ignoring the cast instructions, and (3) changes to the code generator for creating the correct types. (2) and (3) would presumably be common to the reduction and non-reduction cases and could use the demanded bits analysis if we decide to go that route. We could also split up the work to avoid duplication.

jmolloy requested changes to this revision.Aug 26 2015, 6:34 AM
jmolloy edited edge metadata.

Hi,

You've convinced me that our two patches can actually sit side by side. Please see my specific comments in the patch.

Cheers,

James

include/llvm/Transforms/Utils/LoopUtils.h
197

What would a non-arithmetic kind be? I'm not sure I understand this.

203

I don't like this interface. But that shouldn't stop this patch I think - I think we need to have a good hard redesign of the interface for RecurrenceDescriptor (and InductionDescriptor which I'm ripping out of LoopVectorize in D12284.

lib/Transforms/Utils/LoopUtils.cpp
270

I think you'vr got two sets of parens too many here.

lib/Transforms/Vectorize/LoopVectorize.cpp
3332

You're got too many parens here.

3336

Wouldn't it be just as easy to AND with Type->getMask()?

This revision now requires changes to proceed.Aug 26 2015, 6:34 AM
mssimpso marked 2 inline comments as done.Aug 26 2015, 9:33 AM
mssimpso added inline comments.
include/llvm/Transforms/Utils/LoopUtils.h
197

The intent was to select the recurrence kinds that would be subject to type-promotion (ADD, MUL). We don't need to worry about the bitwise (AND, OR, XOR) or logical (MIN/MAX) operations.

203

I agree, and I look forward to the re-design. Having said that, I'm happy to improve this interface in the meantime if you have specific concerns.

lib/Transforms/Vectorize/LoopVectorize.cpp
3336

InstCombine isn't smart enough to rewrite the expression in the narrower type if we use an AND. Specifically, InstCombiner::EvaluateInDifferentType() relies on CanEvaluateTruncated() to succeed. So we need to insert the trunc unless we want to rewrite the expression ourselves. One alternative would be to extend InstCombine to handle the AND. Another alternative would be to rewrite the expression in the correct width on the fly as we vectorize. This would avoid the trunc/ext, but I think both alternatives would be more complex to implement. What do you think?

mssimpso updated this revision to Diff 33213.Aug 26 2015, 9:35 AM
mssimpso edited edge metadata.

Addressed @jmolloy's comments.

jmolloy accepted this revision.Aug 27 2015, 1:05 AM
jmolloy edited edge metadata.

LGTM, with the comment changes requested below.

Cheers!

James

include/llvm/Transforms/Utils/LoopUtils.h
197

Hi,

OK, so lets keep the function name but could you please explain in a comment?

lib/Transforms/Vectorize/LoopVectorize.cpp
3336

This isn't ideal, but I don't really want to add feature creep to this review. Could you add a fixme saying what the ideal would be?

This revision is now accepted and ready to land.Aug 27 2015, 1:05 AM
mssimpso updated this revision to Diff 33324.Aug 27 2015, 7:11 AM
mssimpso edited edge metadata.

Updated comments.

Thanks very much for the review, James! Chad, would you mind landing this
revision when you have a chance? Thanks!

mcrosier closed this revision.Aug 27 2015, 7:13 AM

Committed r246149.