This is an archive of the discontinued LLVM Phabricator instance.

[SLP] sort candidates to increase chance of optimal compare reduction

Authored by spatel on Sep 16 2020, 8:53 AM.



This is one (small) part of improving PR41312:

As shown there and in the smaller tests here, if we have some member of the reduction values that does not match the others, we want to push it to the end (bring the matching members forward and together).

In the examples here, we have 5 candidates for the 4 slots of the reduction. If the one "wrong" compare is grouped with the others, it prevents forming the ideal v4i1 compare reduction.

Diff Detail

Event Timeline

spatel created this revision.Sep 16 2020, 8:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2020, 8:53 AM
spatel requested review of this revision.Sep 16 2020, 8:53 AM
ABataev added inline comments.Sep 16 2020, 9:12 AM

Maybe, better to change this expression too? The main problem here is that the sliding window has step ReduxWidth. Maybe, use 1 here in common case? Also, I don't think sorting is safe here. It is better to keep the original order of the instructions, but pre-select only the matching ones.

spatel added inline comments.Sep 16 2020, 9:35 AM

I'm not sure what the suggestion is exactly - if we allow any reduction width, will we effectively re-open D59710 and all of its problems (miscompiles, perf loss, etc)?

Do you have an idea why sorting is not safe? If I'm seeing it correctly, we have guaranteed that all ops in the reduction are associative and additionally they are all in the same basic block. Therefore, they should be safe to reorder (the reduction itself requires that property?).

ABataev added inline comments.Sep 16 2020, 9:52 AM
  1. Ah, yes, I missed that it works only in case of the successful vectorization attempt. What I meant is to add a way to rebuild the VL in case if vectorization attempt was unsuccessful. Currently, we just early exit in this case and ignore the remaining part of the reduced values.
  2. Yeah, it should be safe. Anyway, could you add a test where the original order of the reduced values is changed by the sorting?
spatel added inline comments.Sep 16 2020, 10:35 AM

Sure - I think that is challenging given the current restrictions in this patch because we also have this limitation in buildTree_rec():

// Check that all of the compares have the same predicate.

But there's a loophole for swapped predicate, so I added a test with that possibility. I think this also shows that it is possible to degrade the reduction if the IR is in some non-standard form, but I'm hoping that is not the common case.

The better solution (assuming this part does not cause trouble) will be to also sort based on the operand values (that's the TODO comment I left in the code).

spatel updated this revision to Diff 292267.Sep 16 2020, 10:36 AM

Patch updated:
Added a test to show forming a better reduction than we used to create.

ABataev added inline comments.Sep 16 2020, 10:39 AM
  1. Agreed, might be a good candidate for future work.
  2. I think we need to implement both, sorting and improvement for the sliding window. It also might trigger an additional vectorization, though need to be very careful here since it also affects the compile time.
ABataev added inline comments.Sep 16 2020, 10:42 AM


spatel updated this revision to Diff 292321.Sep 16 2020, 1:09 PM
spatel marked an inline comment as done.

Patch updated:
Use stable_sort to reduce movement (not sure how to expose a difference in a test though).

This revision is now accepted and ready to land.Sep 16 2020, 1:12 PM