This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] matchBinOpReduction - add partial reduction matching
ClosedPublic

Authored by RKSimon on Jul 21 2019, 3:40 AM.

Details

Summary

This patch adds support for recognising cases where a larger vector type is being used to reduce just the elements in the lower subvector:

e.g. <8 x i32> reduction pattern in a <16 x i32> vector:

<4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u>
<2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u>
<1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u>

matchBinOpReduction returns the lower extracted subvector in such cases, assuming isExtractSubvectorCheap accepts the extraction.

I've only enabled it for X86 reduction sums so far. I intend to enable it for the bitop/minmax cases in future patches, and eventually I think its worth turning it on all the time. This is mainly just a case of ensuring calls to matchBinOpReduction don't make assumptions on the vector width based on the original vector extraction.

Fixes the x86 partial reduction sum cases in PR33758 and PR42023.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Jul 21 2019, 3:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2019, 3:40 AM
RKSimon added inline comments.Jul 21 2019, 4:00 AM
test/CodeGen/X86/phaddsub-extract.ll
2128

We don't optimize the SSE case because the isExtractSubvectorCheap check returns false (SSE has no EXTRACT_SUBVECTOR ops) - we could tweak this to always permit the extraction pre-legalops, or we tweak X86TargetLowering::isExtractSubvectorCheap to handle the case where we know extract_subvector will just end up splitting into legal types.

lebedev.ri added inline comments.Jul 21 2019, 4:08 AM
include/llvm/CodeGen/SelectionDAG.h
1586–1593

Aren't slashes wrong here, should it be \p ?

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9055–9056

This might warrant a comment/assertion that we do know that i is always less than 31.

RKSimon updated this revision to Diff 210989.Jul 21 2019, 5:13 AM

address review comments

RKSimon marked 3 inline comments as done.Jul 21 2019, 5:19 AM
lebedev.ri added inline comments.Jul 21 2019, 5:22 AM
include/llvm/CodeGen/SelectionDAG.h
1586–1593

(in the original comment too, but that is not specific to the patch)

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9055–9056

I indeed meant 31, not 32.
Because if i=31, while there's no overflow, you get (unsigned(1)<<31). which is INT_MIN,
and then in

for (int Index = 0; Index < (int)MaskEnd; ++Index)

you'll get at least a signed overflow, and maybe a deadlock.

lebedev.ri added inline comments.Jul 21 2019, 5:25 AM
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9055–9056

Err, other way around, the starting condition will be 0 s< INT_MIN, and that is false,
so the loop won't run at all.

xbolva00 added inline comments.
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9056

assert ‘Stages’ , not ‘i’

RKSimon updated this revision to Diff 210994.Jul 21 2019, 6:27 AM

Fix the assertion to ensure Stages is in signed range, not i

Seems ok.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9056

Now the assertion is obviously-tautological, and it lost it's meaning - it does not convey the intent ('*this* code will break otherwise'), only some hardcoded limit.

9085–9086

Would it be better to always return via PartialReduction(), and special-case the case with non-partial reduction in PartialReduction() itself?

RKSimon updated this revision to Diff 210998.Jul 21 2019, 7:50 AM

Limit Stages to < 31 to fix signed/unsigned issue

RKSimon marked an inline comment as done.Jul 21 2019, 7:51 AM
RKSimon added inline comments.
lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9085–9086

I don't think so - it gains us nothing and just complicates the logic in PartialReduction()

lebedev.ri accepted this revision.Jul 21 2019, 12:56 PM
lebedev.ri marked an inline comment as done.

This looks ok to me, but maybe wait for one more review.

lib/CodeGen/SelectionDAG/SelectionDAG.cpp
9017

I'm now regretting commenting about it in the first place.
We don't have a problem with Stages==31 yet.
The loop would iterate from 0 to 30, 1 << 30 is not UB,
and won't set signbit so the for (int Index = 0; Index < (int)MaskEnd loop is good too.
If you want to assert for Stages, then it should stay as Stages < 32, but that is trivially-tautological.
Or move the assertion back into the loop and assert 1 < 31, which is what we are really checking.
Or just drop it, this review shows it to be more confusing and controversial than worthwhile.

This revision is now accepted and ready to land.Jul 21 2019, 12:56 PM

@spatel are you OK with this ?

@spatel are you OK with this ?

Yes, LGTM.

This revision was automatically updated to reflect the committed changes.