This is an archive of the discontinued LLVM Phabricator instance.

[SLPVectorizer] Reorder operands of shufflevector if it can result in a vectorized code.
ClosedPublic

Authored by karthikthecool on Dec 16 2014, 1:49 AM.

Details

Summary

Hi All,
The below code was not being vectorized in vector shuffle in SLPVectorizer.

float fa[4],fb[4],fc[4];
void fn() {
  fc[0] = fb[0]+fa[0];
  fc[1] = fa[1]-fb[1];
  fc[2] = fa[2]+fb[2];
  fc[3] = fa[3]-fb[3];
}

This was because we were to take the operands in the given order fb[0] and fa[1] are not consecutive access. But since '+' is commutative for both float and int for which we handle Shuffle Vector. Hence we can reorder the addition fb[0] + fa[0] -> fa[0] + fb[0] in which case buildTree_rec will be able to conclude it as a consecutive load and vectorize the same.

In this patch we check if we can reorder commutative operations in AltShuffle which can result in vectorization if yes we reorder the operands of the commutative operation.

Please let me know if this is good to commit.

Thanks and Regards
Karthik Bhat

Diff Detail

Repository
rL LLVM

Event Timeline

karthikthecool retitled this revision from to [SLPVectorizer] Reorder operands of shufflevector if it can result in a vectorized code..
karthikthecool updated this object.
karthikthecool edited the test plan for this revision. (Show Details)
karthikthecool set the repository for this revision to rL LLVM.
karthikthecool added a subscriber: Unknown Object (MLST).

Hi Karthik,

This patch is very similar to http://reviews.llvm.org/D6675. Do you think it's possible to reuse a reordering code from one patch in another?

Hi Michael,
I went through the link http://reviews.llvm.org/D6675. The swap operation is different for these 2 patches. The patch in the http://reviews.llvm.org/D6675 tries to swap across lanes and is valid only for interger operands.

In this patch though we swap within the same lane when the operand is commutative so it can be applied to floating point operands as well because float addition etc are commutative but NOT necessarily transitive.
I feel having seperate function to handle this case would be better. But please let me know if you feel otherwise; will see if i can get some common parts out for the 2 patches.

Thanks and Regards
Karthik Bhat

aschwaighofer added inline comments.Jan 7 2015, 9:51 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
343–349

Why does that only apply to altShuffle operands shouldn't this work on any commutative binary vector operation?

1738–1908

Can you add some documentation to this function. Something like

// Reorder operands of commutative operations if the resulting vectors are consecutive loads.

Hi Arnold,
Please find my comments inline. I will add more documentation to the function if this looks good with you.
Thanks and Regards
Karthik Bhat

lib/Transforms/Vectorize/SLPVectorizer.cpp
343–349

Hi Arnold,
The reason why we applied reorderAltShuffleOperands to altShuffle operands is that in case of code such as-

fc[0] = fa[0]+ fb[0]; 
fc[1] = fb[1]+fa[1];  // operands have been swapped in code.
fc[2] = fa[2]+fb[2];
fc[3] = fa[3]+fb[3];

It already gets vectorized and reordring happens in reorderInputsAccordingToOpcode function.

But reorderInputsAccordingToOpcode cannot be called in case of AltShuffleOperands as all operations used in AltShuffleOperands are not commuitative (i.e. add is commutiative but sub is not). Hence we added reorderAltShuffleOperands to handle ordering in case of shuffle vector with alt opcode.

Hi Karthik,

Thanks for the answer, I agree with you.

Please also see a comment from me inline.

Thanks for working on this!

lib/Transforms/Vectorize/SLPVectorizer.cpp
344–346

I don't think reorderInputsAccordingToOpcode currently handle it. I.e. it can accidentally handle it in some cases, but it doesn't do that always. For example the following code doesn't get vectorized:

define void @foo() #0 {
  %1 = load i32* getelementptr inbounds ([1000 x i32]* @a, i32 0, i64 0), align 4
  %2 = load i32* getelementptr inbounds ([1000 x i32]* @b, i32 0, i64 0), align 4
  %3 = add nsw i32 %1, %2
  store i32 %3, i32* getelementptr inbounds ([1000 x i32]* @c, i32 0, i64 0), align 4
  %4 = load i32* getelementptr inbounds ([1000 x i32]* @a, i32 0, i64 1), align 4
  %5 = load i32* getelementptr inbounds ([1000 x i32]* @b, i32 0, i64 1), align 4  

  ; Please note that %4 and %5 are swapped in the following line:
  %6 = add nsw i32 %5, %4

  store i32 %6, i32* getelementptr inbounds ([1000 x i32]* @c, i32 0, i64 1), align 4
  %7 = load i32* getelementptr inbounds ([1000 x i32]* @a, i32 0, i64 2), align 4
  %8 = load i32* getelementptr inbounds ([1000 x i32]* @b, i32 0, i64 2), align 4
  %9 = add nsw i32 %7, %8
  store i32 %9, i32* getelementptr inbounds ([1000 x i32]* @c, i32 0, i64 2), align 4
  %10 = load i32* getelementptr inbounds ([1000 x i32]* @a, i32 0, i64 3), align 4
  %11 = load i32* getelementptr inbounds ([1000 x i32]* @b, i32 0, i64 3), align 4
  %12 = add nsw i32 %10, %11
  store i32 %12, i32* getelementptr inbounds ([1000 x i32]* @c, i32 0, i64 3), align 4
  ret void
}

It might make sense to handle such cases explicitly, like you do for altShuffles.

Hi Michael,
Thanks for the inputs. Please find my comments inline.
Thanks
Karthik Bhat

lib/Transforms/Vectorize/SLPVectorizer.cpp
344–346

Hi Michael,
Thanks for the inputs. I feel the reason the above code doesn't get vectorized is because on 64 bit machine the GVN pass combines the 2 32 bit load into a 64 bit load as a result the pattern match in SLP fails. You can reffer to D6654 for the same. If we run GVN pass after SLPVectorizer the above code gets vectorized.
SLP vectorizer not being able to vectorize widned load is a seperate issue and i plan to work on it shortly.

I will also try to see if i can see any example were we need to handle this case in reorderInputsAccordingToOpcode.

Actually I tried to add this pattern matching at the end of reorderInputsAccordingToOpcode but it results in few regressions in "operandorder.ll" by reordering "good" source order . I'm trying to debug this seperatly as well.

Hi Karthik,

While the loads widening is a real problem, the example I wrote in previous comment doesn't need any GVN invocation at all - you can run slp (+basicaa) on it and see that SLP fails to vectorize it. To make it even clearer, we can use double instead of i32:

  1. Vectorized:

Original code:

double a[1000], b[1000], c[1000];
void foo()
{
  c[0] = a[0] + b[0];
  c[1] = a[1] + b[1];
}

IR:

define void @foo() #0 {
  %1 = load double* getelementptr inbounds ([1000 x double]* @a, i32 0, i64 0), align 4
  %2 = load double* getelementptr inbounds ([1000 x double]* @b, i32 0, i64 0), align 4
  %3 = fadd double %1, %2
  store double %3, double* getelementptr inbounds ([1000 x double]* @c, i32 0, i64 0), align 4
  %4 = load double* getelementptr inbounds ([1000 x double]* @a, i32 0, i64 1), align 4
  %5 = load double* getelementptr inbounds ([1000 x double]* @b, i32 0, i64 1), align 4
  %6 = fadd double %4, %5
  store double %6, double* getelementptr inbounds ([1000 x double]* @c, i32 0, i64 1), align 4
  ret void
}

IR after SLP:

define void @foo() #0 {
  %1 = load <2 x double>* bitcast ([1000 x double]* @a to <2 x double>*), align 16, !tbaa !2
  %2 = load <2 x double>* bitcast ([1000 x double]* @b to <2 x double>*), align 16, !tbaa !2
  %3 = fadd <2 x double> %1, %2
  store <2 x double> %3, <2 x double>* bitcast ([1000 x double]* @c to <2 x double>*), align 16, !tbaa !2
  ret void
}
  1. Not vectorized:

Original code:

double a[1000], b[1000], c[1000];
void foo()
{
  c[0] = a[0] + b[0];
  c[1] = b[1] + a[1]; // a[1] and b[1] are swapped
}

IR:

define void @foo() #0 {
  %1 = load double* getelementptr inbounds ([1000 x double]* @a, i32 0, i64 0), align 4
  %2 = load double* getelementptr inbounds ([1000 x double]* @b, i32 0, i64 0), align 4
  %3 = fadd double %1, %2
  store double %3, double* getelementptr inbounds ([1000 x double]* @c, i32 0, i64 0), align 4
  %4 = load double* getelementptr inbounds ([1000 x double]* @a, i32 0, i64 1), align 4
  %5 = load double* getelementptr inbounds ([1000 x double]* @b, i32 0, i64 1), align 4
  %6 = fadd double %5, %4    ; %4 and %5 are swapped
  store double %6, double* getelementptr inbounds ([1000 x double]* @c, i32 0, i64 1), align 4
  ret void
}

IR after SLP:

define void @foo() #0 {
  %1 = load double* getelementptr inbounds ([1000 x double]* @a, i64 0, i64 0), align 16, !tbaa !2
  %2 = load double* getelementptr inbounds ([1000 x double]* @b, i64 0, i64 0), align 16, !tbaa !2
  %3 = load double* getelementptr inbounds ([1000 x double]* @b, i64 0, i64 1), align 8, !tbaa !2
  %4 = load double* getelementptr inbounds ([1000 x double]* @a, i64 0, i64 1), align 8, !tbaa !2
  %5 = insertelement <2 x double> undef, double %1, i32 0
  %6 = insertelement <2 x double> %5, double %3, i32 1
  %7 = insertelement <2 x double> undef, double %2, i32 0
  %8 = insertelement <2 x double> %7, double %4, i32 1
  %9 = fadd <2 x double> %6, %8
  store <2 x double> %9, <2 x double>* bitcast ([1000 x double]* @c to <2 x double>*), align 16, !tbaa !2
  ret void
}

One correction to my last reply: I wrote that SLP doesn't vectorize the example, which is not technically correct. It vectorizes the code, but the code is far from optimal.

Hi Michael,
Thanks for the clarifications. Updated the code to handle reordering for any commutative binary vector operation were it is profitable.
Post this we are able to optimally vectorize your example.
Please let me know if this looks good to you.

Thanks and Regards
Karthik Bhat

Hi Michael,
One small update in the patch. I was not handling case were we have right operand as load and left as some other binary operation in case of reorderAltShuffleOperands as a result we were not vectorizing code such as -

void foo(double* c,double* restrict a,double* restrict b,double* restrict d) {
  c[0] = (a[0] + b[0])-d[0];
  c[1] = d[1]+(a[1]+b[1]);
}

This is fixed with this updated patch. This is not required in reorderInputsAccordingToOpcode as it is already handled while we sort as per the opcode.
Please let me know your inputs on the same.

Thanks
Karthik Bhat

Hi Karthik,

Thanks for the updated patch, it looks good to me except some minor points (you could find my remarks inline).

Thanks,
Michael

lib/Transforms/Vectorize/SLPVectorizer.cpp
344–349

Do we really need these functions to be public?

1875–1876

Something wrong with formatting here.

1888

A missing sentence end after 'tree'?

1899–1907

To minimize change in the current behavior, we could place this check before the last loop. In this case we won't try to reorder the operands if one part is splat or 'allSameOpcode'. Also, we'll get rid of needsReordering flag.

test/Transforms/SLPVectorizer/X86/operandorder.ll
254–256

C-equvalent in comments would be useful here and before other test cases.

Hi Michael,
Thanks for your time and review comments.
Updated the patch addressing the review comments.

Thanks
Karthik Bhat

Hi Michael,
While still at this patch fix one more issue(small change) originally present in reorderInputsAccordingToOpcode.

In reorderInputsAccordingToOpcode AllSameOpcodeLeft and AllSameOpcodeLeft should return true if all the opCode on left and right respectively are same. But alas in the code we can see that in the loop we are resetting the value to-

AllSameOpcodeLeft = I0;
AllSameOpcodeRight = I1;

for each iteration. As a result we end up checking if the last 2 instructions have the same opcode in left and right.
As a result a code such as -

void foo(float* restrict a,float* restrict b,float* restrict c, float* restrict d)
{
  a[0] = (b[0]+c[0])+d[0];
  a[1] = d[1]+(b[1]+c[1]);
  a[2] = (b[2]+c[2])+d[2];
  a[3] = (b[3]+c[3])+d[3];
}

results in unoptimal vectorization as it considers all opcode on left and right to be same(since the last 2 have same opcode on left and right) and doesn't reorder the 2nd instruction which could have resulted in a better vectorized code.
This updated patch fixes this issue and added a test case for the same.

Please let me know your inputs on the same.

Thanks and Rergards
Karthik Bhat

Hi Karthik,

Thanks for the updated patch, please see my comments inline.

As for the AllSameOpcode changes - your fix looks right anyway, and I think it might go independently of this patch.

lib/Transforms/Vectorize/SLPVectorizer.cpp
1875–1876

You might want to return from here. Otherwise, the broadcast/allSameOpcode could be spoiled by the following it code.

1896

This check looks strange. Why do we bail out if e.g. left-operands are the same? If all left-elements are the same, we should have AllSameOpcode, otherwise I see no special value in keeping them together.

I guess that here you wanted to discard some cases, when we don't want to change anything not to spoil already good disposition. If that's the case, I think we need a better check to detect such cases (e.g. Left[i] and Left[i+1] are consecutive or something like this).

1899

It is possible that Left[i] and Right[i+1] are not consecutive, yet the swap is beneficial.

See the following example:

a[0] + b[0]
b[1] + c[1]
c[2] + d[2]
d[3] + e[3]

In this scenario, the current code won't swap anything, but it could be transformed to

a[0] + b[0]
c[1] + b[1]
c[2] + d[2]
e[3] + d[3]

if we check Right[i] Vs Left[i+1] too.

It's an artificial example though, and maybe it's not worth a code to handle it (but I'm not sure if the original code we are trying to handle isn't artificial as well). It would be great btw if you share the motivation for this changes - have you seen something like this in real-world tests?

test/Transforms/SLPVectorizer/X86/addsub.ll
254

Missed ; at the end of the line :)

280–283

What about C-code for this test?

Hi Michael,
Thanks for the review. Please find the updated patch and my comments inline.
Please let me know if this is good to commit.
Thanks and Regards
Karthik Bhat

Please find my comments inline. Thanks.

lib/Transforms/Vectorize/SLPVectorizer.cpp
1875–1876

We cannot return from here as in case of allSameOpcode as well we still might want to reorder.
For instance in the example which you mentioned in previous comments-
e.g.

c[0] = a[0] + b[0];
c[1] = b[1] + a[1]; // a[1] and b[1] are swapped

all the left and right operands have same opcode i.e. load. So AllSameOpcodeRight and AllSameOpcodeLeft would be true but we still want to reorder those loads if they were commutative across instruction as in the example.
Yes if this was a broadcast we might not want to reorder and keep original order for which we have the check in the below code as -

!(Left[j] == Left[j + 1] || Right[j] == Right[j + 1])

This implies that if this insturction which we are trying to reorder is part of a broadcast then do not reorder.

1896

Hi Michael,
As mentioned in previous comments. AllSameOpcode just checks if all the opcodes of the instructions are same not necessarily the instructions itself. To check if this is a broadcast we need to compare the actual instructions. Hence this check was added to make sure that we do not reorder anything that is part of broadcast.

Now that i think about it i think we can check the LeftBroadcast and RightBroadcast before entering the loop and avoid this check inside the loop.

1899

I initally wanted to just handle this for shuffle vectors as I had found some hand written code in our codebase which had that pattern but extended it generaically for any commutative operands as per review comments from arnold. As you mentioned I'm not sure if it is worth to handle all combinations but will add this check anyway as it doesn't seem to be a lot of overhead.

Hi Karthik,

Thanks for the comments, please find my answers inline (there is a couple of them in the tests too).

lib/Transforms/Vectorize/SLPVectorizer.cpp
1896

Yep, I'd prefer checking for broadcast before the loop.

Actually, checking for AllSameOpcode before the reordering loop now seems useless - anyway it's invalidated when we swap operands. So, what do you think we should do here? Check for broadcasts, reorder commutative operands, check for AllSameOperand, then revert to original version if all of these attempts failed?

2308–2311

My understanding is that we only vectorize ShuffleVectors if it's AltInst, which means VL0 should always be a BinaryOperator. Am I wrong here? If I'm not, please add an assert and remove if-else here.

Hi Michael,
Thanks for the reveiw. Addressed all review comments.
I'm ok with the current way we are handling reorderInputsAccordingToOpcode. If we were to move the load matching before checking AllSameOpcodeRight we would need to maintain a flag so that we do not reorder operands that were swapped during consecutive load matching.(This would end up similar to the initial version of the patch that was uploaded with needsReordering flag). I do not see much difference in the 2 approaches. I hope we can keep this as it is?

Please let me know if i can go ahead and commit this into mainline or if you have any other comments.
Thanks a lot.

Regards
Karthik Bhat

Hi Karthik,

Thanks for the fixes, the patch mostly looks ok, but I still have a bad feeling regarding reorderInputsAccordingToOpcode.
Now we have two checks for if(!(LeftBroadcast || RightBroadcast)) in a row, which doesn't look good. And the overall logic of the function becomes really fuzzy.

You're right that it resembles the original approach. However, I don't think needToReorder flag solves the problem in a good way either. To solve it properly, we need to set our priorities first by answering simple questions:

  1. If the original operands formed a broadcast, do we want to touch them?
  2. What do we prefer: get AllSameOpcode operands, or operands, some of which are consecutive?
  3. What if after swapping to create consecutive operands we lose AllSameOpcode property?
  4. What if operands are AllSameOpcode, but not consecutive?
  5. What if left operands are broadcast, but if we swap one of them with a right one, we'll get consecutive access in the right operands?

...

I guess that we actually only care about cases in which we can make operands both consecutive, and 'AllSameOpcode' - if either of these two conditions is false, we most probably end up with no-vectorization. Thus, we want to maintain both properties at the same time, bailing out if that's not possible.

As for the broadcasting, I think it's always bad to break it, so once we discover it, we want to keep it.

Do you agree with these general strategies, or do you have another opinion?

If we agree on them, it'll be easy to efficiently align the code around them, like the following:

  1. Sort the operands as we do right now.
  2. Check for broadcasts, if isSplat(Left) or isSplat(Right) - return.
  3. Try to reorder things to create as many consecutive loads as possible.
  4. Check if we have AllSameOpcode in either left, or right operands. If yes, return Left and Right, otherwise - LeftOrig and RightOrig.

Hi Michael,
Thanks for the review. Addressed review comments after slight modification in code flow. This version does minimal changes in reorderInputsAccordingToOpcode and to make logic clear i have added appropriate comments in the code.
The only change is the check we have added at the end of reorderInputsAccordingToOpcode to reorder operands to create a longer vectorizable chain without effecting AllSameOpcode property.

I hope this makes things clearer in reorderInputsAccordingToOpcode?

Please find the updated patch and comments inline-

  1. If the original operands formed a broadcast, do we want to touch them? > No. As you mentioned if we detect a broadcast we do not reorder them. We can return as soon as we detect a broadcast.
  1. What do we prefer: get AllSameOpcode operands, or operands, some of which are consecutive? > If we have AllSameOpcode operands reodering it so that consecutive loads are grouped together will not change AllSameOpcode property. > This case (i.e AllSameOpcode and code hits our swap logic) is only possible when we have all loads in left and right side of the binary > operation.)
  1. What if after swapping to create consecutive operands we lose AllSameOpcode property? > This is not possible as far as i can tell. > If we have AllSameOpcode and if we enter in to the condition to swap to create consecutive operands. It means that we have all loads in > left and right lane. Hence swaping will retain AllSameOpcode property as we will still have loads on either side after swap.
  1. What if operands are AllSameOpcode, but not consecutive? > Our logic should not alter anything in this scenario.
  1. What if left operands are broadcast, but if we swap one of them with a right one, we'll get consecutive access in the right operands? > As you mentioned we don't want to disturb boardcast. Hence we do not do anything here.

Comments on code flow suggested -

  1. Sort the operands as we do right now. > OK
  2. Check for broadcasts, if isSplat(Left) or isSplat(Right) - return. > OK
  3. Try to reorder things to create as many consecutive loads as possible. > OK
  4. Check if we have AllSameOpcode in either left, or right operands. If yes, return Left and Right, otherwise - LeftOrig and RightOrig. > We will have a problem here. As mentioned above in case we have AllSameOpcode there are 2 cases which we have to handle- > 1) If operands are AllSameOpcode but reordering will result in consecutive loads and retain *AllSameOpcode* property. > 2) If operands are AllSameOpcode but not consecutive loads. > So in the 1st case above we would like to reorder operands but in the second case we return the LeftOrig and RightOrig. > In both these case AllSameOpcode is the same but in one case we want to reorder and other case we want to return LeftOrig and RightOrig. > So if we follow this code flow any operands reordered in step 3 will be discarded and we will return LeftOrig and RightOrig preventing > vectorization.

So the final code flow will be something like -

  1. Sort the operands as we do right now. (Same as in original code)
  2. If broadcast then return. (Same as in original code but we return now as you suggested)
  3. Check if we have AllSameOpcode if yes retain the original left/right order (same as original code)
  4. Check if we can reorder the opcodes without disturbing good operand order if yes reorder the same. (Our logic)

Thanks
Karthik Bhat

Hi Micahel, sorry i think the comments above are not readable.Some problem with phabricator formatting.
Updated comments to make if more readable.

Please find the updated patch and comments inline-

1.If the original operands formed a broadcast, do we want to touch them?

No. As you mentioned if we detect a broadcast we do not reorder them. We can return as soon as we detect a broadcast.

2.What do we prefer: get AllSameOpcode operands, or operands, some of which are consecutive?

If we have AllSameOpcode operands reodering it so that consecutive loads are grouped together will not change AllSameOpcode property.
This case (i.e AllSameOpcode and code hits our swap logic) is only possible when we have all loads in left and right side of the binary
operation.)

3.What if after swapping to create consecutive operands we lose AllSameOpcode property?

This is not possible as far as i can tell.
If we have AllSameOpcode and if we enter in to the condition to swap to create consecutive operands. It means that we have all loads in
left and right lane. Hence swaping will retain AllSameOpcode property as we will still have loads on either side after swap.

4.What if operands are AllSameOpcode, but not consecutive?

Our logic should not alter anything in this scenario.

5.What if left operands are broadcast, but if we swap one of them with a right one, we'll get consecutive access in the right operands?

As you mentioned we don't want to disturb boardcast. Hence we do not do anything here.

Comments on code flow suggested -

1.Sort the operands as we do right now.

OK

2.Check for broadcasts, if isSplat(Left) or isSplat(Right) - return.

OK

3.Try to reorder things to create as many consecutive loads as possible.

OK

4.Check if we have AllSameOpcode in either left, or right operands. If yes, return Left and Right, otherwise - LeftOrig and RightOrig.

We will have a problem here. As mentioned above in case we have AllSameOpcode there are 2 cases which we have to handle-
1) If operands are AllSameOpcode but reordering will result in consecutive loads and retain *AllSameOpcode* property. 
2) If operands are AllSameOpcode but not consecutive loads.
So in the 1st case above we would like to reorder operands but in the second case we return the LeftOrig and RightOrig. 
In both these case AllSameOpcode is the same but in one case we want to reorder and other case we want to return LeftOrig and RightOrig.

So the final code flow will be something like -

  1. Sort the operands as we do right now. (Same as in original code)
  2. If broadcast then return. (Same as in original code but we return now as you suggested)
  3. Check if we have AllSameOpcode if yes retain the original left/right order (same as original code)
  4. Check if we can reorder the opcodes without disturbing good operand order if yes reorder the same. (Our logic)

Thanks
Karthik Bhat

Hi Karthik,

Thanks for updating the patch according to my numerous remarks:) To be honest, I think that the current path looks good, but I want to make sure we've considered all the possible cases before it gets in to the trunk. That'll guarantee that we do what we want to do in all the cases and the produced code is optimal.

Now back to the comments.


  1. If the original operands formed a broadcast, do we want to touch them?

No. As you mentioned if we detect a broadcast we do not reorder them.
We can return as soon as we detect a broadcast.

  1. What if left operands are broadcast, but if we swap one of them with a right one,

we'll get consecutive access in the right operands?

As you mentioned we don't want to disturb boardcast.
Hence we do not do anything here.

Great that we've agreed on this, it'll help to hoist the broadcast check and forget about it:)


  1. What do we prefer: get AllSameOpcode operands,

or operands, some of which are consecutive?

If we have AllSameOpcode operands reodering it so that consecutive loads
are grouped together will not change AllSameOpcode property.
This case (i.e AllSameOpcode and code hits our swap logic) is only possible
when we have all loads in left and right side of the binary operation.)

  1. What if after swapping to create consecutive operands we lose AllSameOpcode property?

This is not possible as far as i can tell.
If we have AllSameOpcode and if we enter in to the condition
to swap to create consecutive operands. It means that we have all loads in
left and right lane. Hence swaping will retain AllSameOpcode property
as we will still have loads on either side after swap.

  1. What if operands are AllSameOpcode, but not consecutive?

Our logic should not alter anything in this scenario.

That's not exactly true. Consider the following example:

b[0]     b[0]
b[1]     a[0]+c[0]
b[2]     a[1]*c[1]
b[3]     d[5]/e[6]

Here we have AllSameOpcodeLeft=false and AllSameOpcodeRight=true. However, if we swap b[1] and a[0]+c[0] (since Right[0] and Left[1] are consecutive), we'll lose a good candidate for vectorization.

This example not only shows, how we can lose AllSameOpcode property, but also how we can disturb initially good order by swapping operands.

And to reiterate: of course this is a constructed artificial case, and we most likely won't see anything like this in real tests. But such cases still should be considered since they can reveal flaws in our logic. I.e. it's fine if we can modify our logic to do the best in every case. But it's also fine if we decide to ignore such (hopefully rare) cases to keep the logic simple, but in this case I would prefer a comment explaining that it was an intentional, not just a random decision.

Thanks,
Michael

lib/Transforms/Vectorize/SLPVectorizer.cpp
1808–1809

Maybe it's better to rename I0 to ILeft, and I1 to IRight, or something like this? (the same for V0, P0 and others). What do you think?

Hi Michael,
Thanks for the input. I really appreciate your help in the review process. I will rename the variables and update the patch as per review comments shortly.
Just one clarification here.
In your example -

b[0]     b[0]
b[1]     a[0]+c[0]
b[2]     a[1]*c[1]
b[3]     d[5]/e[6]

Post reordering i think we will still have good load order but in the right lane instead of left lane. If I'm not wrong we would get something like-

    b[0]     b[0]
a[0]+c[0]    b[1]
a[1]*c[1]    b[2]    
d[5]/e[6]    b[3]

retaining AllSameOpcode on right instead of left now.

Thanks and Regards
Karthik Bhat

Hi Karthik,

Yes, you're right about my example. But we can modify it in the following way:

a[0]+c[0]    b[0]
a[3]/d[3]    b[1]
b[2]         b[2]
a[1]*c[1]    b[3]

If I'm not mistaken, we'll transform it to

a[0]+c[0]    b[0]
a[3]/d[3]    b[1]
b[2]         b[2]
b[3]         a[1]*c[1]

Thanks,
Michael

Hi Michael,
I agree with you on the above example. Added a FIXME comment as per your suggession.
Updated variable names as per review comments.
Thanks and Regards
Karthik Bhat

Thanks, Karthik,

The patch looks good to me.

Michael

karthikthecool accepted this revision.Jan 19 2015, 10:16 PM
karthikthecool added a reviewer: karthikthecool.

Thanks Michael. Commit as r226547

This revision is now accepted and ready to land.Jan 19 2015, 10:16 PM