This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Refactoring of the operand reordering code.
ClosedPublic

Authored by vporpo on Mar 28 2019, 6:22 PM.

Details

Summary

This is a refactoring patch which should have all the functionality of the current code. Its goal is twofold:
i. Cleanup and simplify the reordering code, and
ii. Generalize reordering so that it will work for an arbitrary number of operands, not just 2.

This is the second patch in a series of patches that will enable operand reordering across chains of operations. An example of this was presented in EuroLLVM'18 https://www.youtube.com/watch?v=gIEn34LvyNo .

Diff Detail

Repository
rL LLVM

Event Timeline

vporpo created this revision.Mar 28 2019, 6:22 PM

Awesome! Some initial thoughts - you maybe want to checkout D57059 as @ABataev is looking at adding UNDEF support to handle non-power2 types which will probably need to work with VLOperands

lib/Transforms/Vectorize/SLPVectorizer.cpp
616 ↗(On Diff #192753)

Separate pre-commit?

636 ↗(On Diff #192753)

Maybe OperandMode or OperandType?

638 ↗(On Diff #192753)

Its not necessarily the same opcode (might be altopcode etc.)

906 ↗(On Diff #192753)

Better to use Optional<unsigned> for failure?

910 ↗(On Diff #192753)

Could this be static?

912 ↗(On Diff #192753)

ArrayRef<OpMode>

915 ↗(On Diff #192753)

Do we need this? Maybe just put inside reorderInputsAccordingToOpcode?

RKSimon added inline comments.Mar 29 2019, 8:18 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
640 ↗(On Diff #192753)

Matching for extractelement is also an issue: https://bugs.llvm.org/show_bug.cgi?id=41304

vporpo updated this revision to Diff 192958.Mar 29 2019, 8:04 PM
vporpo marked 8 inline comments as done.

I also rebased to support the commutative predicates of D59992.

lib/Transforms/Vectorize/SLPVectorizer.cpp
636 ↗(On Diff #192753)

I am trying to avoid 'Type' because it might get confused with LLVM Type. How about ReorderingMode?

638 ↗(On Diff #192753)

Correct, I will update the comment.

640 ↗(On Diff #192753)

Good point, but let's implement it in a separate patch.

910 ↗(On Diff #192753)

It could, but we need to pass DL and SE for isConsecutiveAccess().

915 ↗(On Diff #192753)

I agree, it is kind of redundant right now.

I have placed it in a standalone function because there will be a second call site for it in the follow-up patch that implements operand reordering across more than 2 operand vectors.

RKSimon added inline comments.Mar 31 2019, 12:51 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
919 ↗(On Diff #192958)

I don't think \reorder is a doxygen tag?

vporpo updated this revision to Diff 193029.Mar 31 2019, 4:16 PM
vporpo marked an inline comment as done.

Fixed doxygen tags.

RKSimon added a subscriber: spatel.Apr 1 2019, 8:08 AM
RKSimon added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
683 ↗(On Diff #193029)

This isn't used yet - can it wait until you need it? Also - should you use ArrayRef?

749 ↗(On Diff #193029)

If this is now the only use of isCommutative then maybe merge them? Not sure if it needs to be inside VLOperands

640 ↗(On Diff #192753)

@spatel has done this in instcombine which should be good enough.

ABataev added inline comments.Apr 1 2019, 1:08 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
683 ↗(On Diff #193029)

This must be an ArrayRef definitely

719 ↗(On Diff #193029)

Provide default initializers

735 ↗(On Diff #193029)

{} -> = default;

802 ↗(On Diff #193029)

const member function

811 ↗(On Diff #193029)

make function const

vporpo updated this revision to Diff 193183.Apr 1 2019, 3:22 PM
vporpo marked 7 inline comments as done.

Addressed the comments.

vporpo updated this revision to Diff 193413.Apr 2 2019, 7:04 PM

Rebased + some minor changes.

RKSimon added inline comments.Apr 3 2019, 3:30 AM
test/Transforms/SLPVectorizer/X86/crash_smallpt.ll
35 ↗(On Diff #193413)

This looks like a regression - we're losing an all constant operand here (plus the TMP1 no longer simplifies to a single scalar load).

test/Transforms/SLPVectorizer/X86/operandorder.ll
47 ↗(On Diff #193413)

Hmm - these tests are called 'preserve_broadcast' and we're not preserving the broadcasts any more.... This might not be a bad thing but it'd be good to confirm (and possibly rename the tests if everything is good)

vporpo updated this revision to Diff 193619.Apr 3 2019, 3:56 PM
vporpo marked 2 inline comments as done.

Updated getBestOperand() to better handle undefs. This fixes the regression in crash_smallpt.ll.

vporpo added a comment.EditedApr 3 2019, 4:00 PM

I am not sure how to proceed with the operandorder.ll test. @ABataev do you know why we would give higher priority to broadcasts?

test/Transforms/SLPVectorizer/X86/crash_smallpt.ll
35 ↗(On Diff #193413)

Good catch.
The reason was that we were skipping undefs when trying to match an instruction in getBestOperand().
I updated the code to accept Undefs as a secondary option.
In a later patch we should also add a new ReorderingMode for Undefs.

test/Transforms/SLPVectorizer/X86/operandorder.ll
47 ↗(On Diff #193413)

I am not sure why we would prefer to build a broadcast instead of vectorizing the loads.
@ABataev any idea about this?

ABataev added inline comments.Apr 4 2019, 6:50 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
714 ↗(On Diff #193619)

Remove semicolon

803 ↗(On Diff #193619)

auto &->const OperandDataVec & for better readability (it is not very easy to deduce the type)

3063 ↗(On Diff #193619)

auto & to real type

3224 ↗(On Diff #193619)

Maybу it would be better to encapsulate all this reordering in the VLOperands class rather than exposing some low-level transformators like swap etc.?

test/Transforms/SLPVectorizer/X86/operandorder.ll
47 ↗(On Diff #193413)

I agree, that the vectorization of loads is better.

RKSimon added inline comments.Apr 4 2019, 7:38 AM
test/Transforms/SLPVectorizer/X86/operandorder.ll
47 ↗(On Diff #193413)

OK - rename as "shuffle_vs_broadcast" or something?

vporpo updated this revision to Diff 194009.Apr 6 2019, 12:24 AM
vporpo marked 6 inline comments as done.

Addressed the comments and also moved the reordering-related functions inside the VLOperands class.

vporpo added inline comments.Apr 6 2019, 12:25 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
3224 ↗(On Diff #193619)

Good point, I don't see any reason why reordering should not be a method of VLOperands. I moved all the reordering related methods inside VLOperands.

No more comments from me @ABataev ?

ABataev added inline comments.Apr 8 2019, 7:25 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
643 ↗(On Diff #194009)

Seems to me, this enum is used only in VLOperands class. Better to move it to class itself.

733–734 ↗(On Diff #194009)

Better to use references rather than pointers

736 ↗(On Diff #194009)

Seems to me, this constructor is not used. Transform it into deleted rather than default.

776 ↗(On Diff #194009)

Make it private, since it is used only in class members

862 ↗(On Diff #194009)

I don't think you need braces here

975 ↗(On Diff #194009)

Private?

978 ↗(On Diff #194009)

Private?

981 ↗(On Diff #194009)

Private?

986 ↗(On Diff #194009)

Check these memeber functions and make private as much as possible

1001 ↗(On Diff #194009)

Preallocate memory for this array since you know the size (OpsVec[OpIdx].size())

vporpo updated this revision to Diff 194732.Apr 11 2019, 12:36 PM
vporpo marked 10 inline comments as done.

Addressed @ABataev 's comments.

lib/Transforms/Vectorize/SLPVectorizer.cpp
736 ↗(On Diff #194009)

It is used by OpsVec.resize() so I can't delete it.

975 ↗(On Diff #194009)

I guess we can keep everything private for now except reorder() and getVL()

ABataev added inline comments.Apr 11 2019, 1:13 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
688 ↗(On Diff #194732)

Use /// style of comments here

848 ↗(On Diff #194732)

Use preincrement

914 ↗(On Diff #194732)

Better to reserve the space when constructing the variable

916 ↗(On Diff #194732)

You should not push here, instead assign to the preallocated elements.

998 ↗(On Diff #194732)

return StringRef.

1031 ↗(On Diff #194732)

Do not use autos here and in the inner loop since it is hard to deduce the real type

1099 ↗(On Diff #194732)

Hmm, can't find the definition of this function.

3222 ↗(On Diff #194732)

Could you make it static? It is strange to see const reorder... function.

vporpo updated this revision to Diff 194797.Apr 11 2019, 6:08 PM
vporpo marked 10 inline comments as done.

Addressed the latest comments.

lib/Transforms/Vectorize/SLPVectorizer.cpp
3222 ↗(On Diff #194732)

Well, it is a member function in order to avoid passing DL and SE as arguments on every call site. Anyway, either is fine with me.

ABataev added inline comments.Apr 12 2019, 9:24 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
962 ↗(On Diff #194797)

Maybe set to Failed here just in case, to prevent possible compiler crash?

991 ↗(On Diff #194797)

Shall we undo previously performed swaps in this case, if we could do some reordering earlier?

vporpo updated this revision to Diff 194910.Apr 12 2019, 10:10 AM
vporpo marked 3 inline comments as done.
vporpo added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
991 ↗(On Diff #194797)

I don't think undoing the swaps would help much. Following the reordering strategy for as long as we can without undoing on failure may actually help sometimes. For example, if our strategy is to look for the same opcode and half-way through the lanes we can no longer match it, we should still keep the order for the ones found so far, because we could potentially still vectorize these operands with a smaller vector length. If we undo them, that will not be possible.

Revisiting the failed operands and trying again with a different strategy could help. But I think we should try implement such back-tracking approaches in a separate optimization patch.

ABataev added inline comments.Apr 12 2019, 10:57 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
991 ↗(On Diff #194797)

Then, maybe, you should continue trying to reorder other elements in the same lane, and should not mark it as Failed too early? You can save the value somewhere in the temp array and update the original element to Failed state after the loop.

vporpo marked an inline comment as done.Apr 12 2019, 11:30 AM
vporpo added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
991 ↗(On Diff #194797)

I am not sure I follow. I think we are not on the same page on how the ReorderingMode strategies work. Let me explain.
The ReorderingModes (e.g., Opcode or Failed) are not per-lane, but per operand index (0 or 1 for now).
So even if an operand index is marked as "Failed" at some lane, the rest of the operands of that lane are still trying to reorder the elements.
The idea is that by marking an operand index as "Failed", we are announcing that it has the lowest priority in picking a value. This makes sense, because a "Failed" operand index is guaranteed not to form a full-length vector, but maybe other operands can still succeed, so they should have higher priority in selecting the operand values.

For example, operand 0 might be initialized with the "Opcode" strategy. Then, say that at lane 3 we can no longer find a matching opcode for operand 0, and operand 0 is therefore marked as "Failed". This means that from this point on, getBestOperand() will always return None for operand 0 on any lane, as operand 0 cannot possibly form a full-size vector. If operand 1 has not failed yet, it will be the one selecting the operand values, not operand 0. This is a good strategy, as operand 1may still succeed in forming a full-size vector.

Does this make sense?

ABataev added inline comments.Apr 12 2019, 11:45 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
991 ↗(On Diff #194797)

Yeah, but it may lead to some non-stable results. At first, we anounce that the operation has a high priority and reorder elements according to this high-pripority. Then, somewhere in the middle, we change to it to lower priority and change the reording. Is this really expected behavior? Or maybe it is better to make some preliminary analysis to identify the most successful strategy and only after that do the real reordering?

vporpo marked an inline comment as done.Apr 12 2019, 12:15 PM
vporpo added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
991 ↗(On Diff #194797)

Ah yes, I agree. Ideally we should back-track and re-decide on the strategy, given that we failed half-way through.
But if we don't back-track and we need to decide in one go, then I don't think there is anything really wrong with what we are doing now. I think it fully covers the functionality of the previous implementation, without adding complexity.

ABataev added inline comments.Apr 12 2019, 12:37 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
991 ↗(On Diff #194797)

But I think we should improve previous implementation, if this possible, agree? And this is possible. We can try to produce more predictable results rather than end up with the mess in some corner cases. I think this is really worth it.

vporpo marked an inline comment as done.Apr 12 2019, 12:48 PM
vporpo added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
991 ↗(On Diff #194797)

Yes, I fully agree, and I will be happy to work on it. But why not do it in a separate patch? This one is a refactoring patch that keeps the reordering more or less equivalent to the existing code. The follow-up patch can focus on improving upon this with specific test cases etc.

ABataev added inline comments.Apr 12 2019, 12:58 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
991 ↗(On Diff #194797)

Hmm, it does not look like a simple NFC patch, since it is already changes the behavior of the compiler. Why not to fix this problem right now, if the main part is implemented already? Make it as a simple 2-pass transformation: preliminary analysis and then actual reordering. We don't need to do the backtracking in this case.

vporpo updated this revision to Diff 194961.Apr 12 2019, 2:10 PM
vporpo marked an inline comment as done.
vporpo added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
991 ↗(On Diff #194797)

OK, how about this one? If the first pass fails, then we perform a second pass using the modes of the first one. If some operand index failed in the first pass, it will have low priority in the second pass for all lanes.

ABataev added inline comments.Apr 15 2019, 10:00 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1010 ↗(On Diff #194961)

I think the first pass should not do any swaps.

vporpo marked an inline comment as done.Apr 15 2019, 10:28 AM
vporpo added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
1010 ↗(On Diff #194961)

Well, if we don't do the swap, then getBestOperand() won't work, as it relies on the state of OpsVec for checking the previous lane vs the current one. We could potentially use some other helper structure to avoid updating the state in OpsVec, but that would complicate the code quite a bit.

Also, I don't see any harm in applying the reordering in the first pass. The worst that can happen is that the order in OpsVec will change in the second pass. Am I missing out something here?

Seems good with a nit

lib/Transforms/Vectorize/SLPVectorizer.cpp
1007 ↗(On Diff #194961)

Enclose substatement into braces

ABataev accepted this revision.Apr 15 2019, 1:38 PM
This revision is now accepted and ready to land.Apr 15 2019, 1:38 PM
vporpo updated this revision to Diff 195328.Apr 16 2019, 1:29 AM
vporpo marked an inline comment as done.

Thank you for the reviews. @RKSimon any more comments ?

RKSimon accepted this revision.Apr 16 2019, 1:54 AM

Thank you for the reviews. @RKSimon any more comments ?

LGTM - cheers

Great, thanks! Please commit the patch , I don't have commit access.

This revision was automatically updated to reflect the committed changes.