This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Enable integer Mul and Add as select reduction patterns
ClosedPublic

Authored by MattDevereau on Jan 16 2023, 5:16 AM.

Details

Summary

[LoopVectorize] Enable integer Mul and Add as select reduction patterns

This patch vectorizes Phi node loop reductions for select's whos condition
comes from a floating-point comparison, with its operands being integers
for Add, Sub, and Mul reductions.

Example:

int foo(float *x, int n) {
    int sum = 0;
    for (int i=0; i<n; ++i) {
        float elem = x[i];
        if (elem > 0) {
            sum += 2;
        }
    }
    return sum;
}

This would previously fail to vectorize due to the integer reduction.

Diff Detail

Event Timeline

MattDevereau created this revision.Jan 16 2023, 5:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 16 2023, 5:16 AM
MattDevereau requested review of this revision.Jan 16 2023, 5:16 AM
MattDevereau added inline comments.Jan 16 2023, 5:21 AM
llvm/test/Transforms/LoopVectorize/if-reduction.ll
826

Unfortunately integer flags aren't being propagated here. After having a quick look around the issue appears non-trivial as fast-math flags are propagated for the floating point case with a disclaimer. In RecurrenceDescriptor::AddReductionVar just after where the changes to RecurrenceDescriptor::isConditionalRdxPattern were made:

// FIXME: FMF is allowed on phi, but propagation is not handled correctly.
if (isa<FPMathOperator>(ReduxDesc.getPatternInst()) && !IsAPhi) {
  FastMathFlags CurFMF = ReduxDesc.getPatternInst()->getFastMathFlags();
  if (auto *Sel = dyn_cast<SelectInst>(ReduxDesc.getPatternInst())) {
    // Accept FMF on either fcmp or select of a min/max idiom.
    // TODO: This is a hack to work-around the fact that FMF may not be
    //       assigned/propagated correctly. If that problem is fixed or we
    //       standardize on fmin/fmax via intrinsics, this can be removed.

After a look around for methods of propagating the IR flags I'm not quite sure how to proceed.

Seems like a sensible change to me.

nit: Could you add a motivating (C/C++ example) to the commit message?

llvm/test/Transforms/LoopVectorize/if-reduction.ll
826

I wouldn't be too worried about this, it seems the nsw/nuw flags aren't propagated for other reductions either.

842

nit: I guess it also works if you remove this fast comparison right?

858

What is the difference between this function and @fcmp_0_add_select1 ?

MattDevereau marked 2 inline comments as done.
MattDevereau edited the summary of this revision. (Show Details)
MattDevereau added inline comments.
llvm/test/Transforms/LoopVectorize/if-reduction.ll
826

Very well, in the C/C++ example I've added the flags aren't propagated either when 1 is used as an immediate and we don't go through the select route.

842

You're correct, yes. I'll go ahead and remove it from the tests I've added.

858

The data type being reduced is i64 in this function whereas its i32 in @fcmp_0_add_select1. These tests are integer clones of @fcmp_0_fadd_select1 and @fcmp_0_fadd_select2 above in this file, with the exception that these integer variants add an immediate instead of loaded in data.

MattDevereau edited the summary of this revision. (Show Details)Jan 17 2023, 4:50 AM
MattDevereau edited the summary of this revision. (Show Details)Jan 17 2023, 5:36 AM
sdesmalen accepted this revision.Jan 23 2023, 8:57 AM
This revision is now accepted and ready to land.Jan 23 2023, 8:57 AM
This revision was landed with ongoing or failed builds.Jan 25 2023, 5:25 AM
This revision was automatically updated to reflect the committed changes.

Hi,

A heads up that I see a miscompile that I bisected back to this patch. I don't have a reproducer I can share yet but I'm working on it.

Hi,

A heads up that I see a miscompile that I bisected back to this patch. I don't have a reproducer I can share yet but I'm working on it.

I think the miscompile is exposed by this example:

opt -passes="loop-vectorize" bbi-78206.ll -S -o - -force-vector-width=4

(I'm just using -force-vector-width=4 since VF 4 is what I got for my out of tree target when I saw the miscompile. It's probably not required.)

The input function does a backward search through @table and when it finds an element larger than the input parameter @val, it remembers the index -1 of that element.
Finally it returns the last found index -1.

Ok. With this patch, the vectorizer triggers and it seems like it does not only return _the_ lowest index -1, but in this case it returns the _sum_ of the different "index - 1" for the large enough elements.

So e.g. if @val is 4660 the input function will return 11-1=10.
But with this patch and after vectorization it returns 12-1 + 11-1=21.

Hi @uabelho, Thanks for the report and reproducer. It seems this snippet of code is incorrectly deducing that this is a reduction. I shall revert this patch and make ammends.

MattDevereau reopened this revision.Jan 26 2023, 4:04 AM
This revision is now accepted and ready to land.Jan 26 2023, 4:04 AM

Hi @uabelho, Thanks for the report and reproducer. It seems this snippet of code is incorrectly deducing that this is a reduction. I shall revert this patch and make ammends.

Thanks!

This patch landed a few days ago but was reverted due to a miscompile.

I've added the tests non_reduction_index and non_reduction_index_half which are reproducers for the miscompile. One of the operands of the binary op in a select of a binary op against a phi node must be the false select operand in order to be reduced. This error caused a miscompile of what should be an index decrement to be a vectorized reduction.

I've verified that I don't see the miscompile anymore with the updated patch. I have not done any other wider testing.