This is an archive of the discontinued LLVM Phabricator instance.

[SLP] match logical and/or as reduction candidates
ClosedPublic

Authored by spatel on Jul 9 2021, 1:41 PM.

Details

Summary

This has been a work-in-progress for a long time...we finally have all of the pieces in place to handle vectorization of compare code as shown in:
https://llvm.org/PR41312

To do this (see PhaseOrdering tests), we converted SimplifyCFG and InstCombine to the poison-safe (select) forms of the logic ops, so now we need to have SLP recognize those patterns and insert a freeze op to make a safe reduction:
https://alive2.llvm.org/ce/z/NH54Ah

We get the minimal patterns with this patch, but the PhaseOrdering tests show that we still need adjustments to get the ideal IR in some or all of the motivating cases.

Diff Detail

Event Timeline

spatel created this revision.Jul 9 2021, 1:41 PM
spatel requested review of this revision.Jul 9 2021, 1:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2021, 1:41 PM

cc @aqjune @nikic @nlopes - if I got this right, it's a nice win from the poison and freeze efforts. :)

Matt added a subscriber: Matt.Jul 9 2021, 2:00 PM
RKSimon added inline comments.Jul 9 2021, 2:08 PM
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
13

It doesn't have to be part of this - but should we be trying to fold these patterns to a reduction intrinsic ?

; CHECK-NEXT:    [[TMP0:%.*]] = fcmp olt <4 x float> [[T:%.*]], zeroinitializer
; CHECK-NEXT:    [[TMP1:%.*]] = freeze <4 x i1> [[TMP0]]
; CHECK-NEXT:    [[TMP3:%.*]] = call i1 llvm.vector.reduce.and.v4i1([[TMP1]])
RKSimon added inline comments.Jul 9 2021, 2:10 PM
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
448

any idea why we only match one of the reduction chains?

cc @aqjune @nikic @nlopes - if I got this right, it's a nice win from the poison and freeze efforts. :)

+1, thanks. :)

spatel added inline comments.Jul 10 2021, 6:00 AM
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
13

We are forming a reduction intrinsic in SLP as we can see in the SLP-only tests.
In this case, we have -O2, so a subsequent InstCombine turns it into bitcast+cmp via:
https://github.com/llvm/llvm-project/blob/d919bca87556548555af0a7aa1239ea64ba4f3e8/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp#L1966

Still need to check what (if any) difference that makes for codegen.

448

I haven't stepped through yet. We did make some adjustments for sorting the reduction ops in previous patches, but I doubt that extended to creating multiple reductions and/or re-running analysis after forming a reduction.

spatel marked 2 inline comments as done.Jul 10 2021, 6:15 AM
spatel added inline comments.
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
448

To be more specific, this test should be adapted into an SLP-only test - the enhancement will need to happen within SLP to handle mapping reduction ops into multiple reductions in some way.

cc @aqjune @nikic @nlopes - if I got this right, it's a nice win from the poison and freeze efforts. :)

very nice, thank you! 🚀

ABataev added inline comments.Jul 12 2021, 5:51 AM
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
448

Investigated it, looks like inefficiency in multi-node analysis (we're forming mixed operand nodes, like {extract 0, extract 1, extract 2, extract 3, 0.0, 0.0, 0.0, 0.0} and {1.0, 1.0, 1.0, 1.0, extract 0, extract 1, extract 2, extract 3}, which are considered as gathers). I hope this can be fixed by D101109.

spatel marked an inline comment as done.Jul 12 2021, 7:09 AM
spatel added inline comments.
llvm/test/Transforms/PhaseOrdering/X86/vector-reductions-logical.ll
448

Thanks for checking. I'll add a basic SLP test to this patch, so we have minimal test coverage, and we can add more as needed.

spatel updated this revision to Diff 357938.Jul 12 2021, 7:16 AM
spatel marked an inline comment as done.

Patch updated:

  1. Added SLP-only test ("clamp") to show enhancement opportunity.
  2. Rebased after D105392 (freeze is now hoisted higher in the PhaseOrdering tests).

LGTM - happy for logical_and_icmp_clamp to be handled in a follow up - any more comments?

RKSimon accepted this revision.Jul 14 2021, 2:18 AM

LGTM - cheers

This revision is now accepted and ready to land.Jul 14 2021, 2:18 AM
This revision was landed with ongoing or failed builds.Jul 14 2021, 6:04 AM
This revision was automatically updated to reflect the committed changes.
efriedma added inline comments.
llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll
183
spatel added inline comments.Jul 15 2021, 11:37 AM
llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll
183

Yes - @nlopes just noted this on the commit page/thread too ( https://reviews.llvm.org/rG25ee55c0baff ).
I didn't notice that we had replaced select insts with logic, so we either need to prevent, restore, or freeze our way out of that.

The partial vectorization poison-leak is hopefully fixed with:
81ce3aa30cc2
And I added more test coverage:
d9abb15774c5
That includes a test where the reduction differs based on costs, and both variants passed Alive2 when I checked.