This is an archive of the discontinued LLVM Phabricator instance.

[X86] Recognize horizontal reduction trees and narrow the width of the later binops.
AbandonedPublic

Authored by craig.topper on Mar 20 2018, 12:02 AM.

Details

Reviewers
RKSimon
spatel
Summary

This patch teaches DAG combine to recognize an extract_subvector of a horizontal reduction step and to reduce the size of the operation. New extract_subvectors will be inserted to propagate the reduction up the tree.

If the starting binop size is 512-bits wide, this reduction can allow the later steps to be narrowed to 128/256 bits were we can use a shorter VEX encoding.

I've put in the ADD, MIN, and MAX instructions so far, but there may be other operations we should support.

I have noticed an oddity due to the order that DAG combine visits nodes. We visit the last layer before all the FMAX/FMIN nodes get created. This prevents the combine from being recognized. A later DAG combine trigger by type legalization or vector legalization can catch it, but those DAG combines aren't guaranteed to run if nothing was legalized. We could mitigate this by detecting the reduction step at the binop itself and just padding the upper bits with undef hoping its used by an extract_subvector? I think that would get properly triggered as we create FMAX in the upper nodes since the combine will add users back to the worklist. Thoughts?

Diff Detail

Event Timeline

craig.topper created this revision.Mar 20 2018, 12:02 AM

Thinking about this again in terms of D47401 - would it be better to focus on making better use of TargetLowering::SimplifyDemandedVectorElts ?

I'm not sure SimplifyDemandedElts can do it as it currently exists. All the arithmetic nodes have two users. SimplifyDemandedElts can't handle that can it?

craig.topper abandoned this revision.Oct 20 2018, 12:41 PM