This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
16/196
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/X86/
-
Transforms/
-
SLPVectorizer/
-
X86/
-
vect_copyable_in_binops.ll

Differential D28907

[SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops.
Needs RevisionPublic

Authored by dtemirbulatov on Jan 19 2017, 8:56 AM.

Download Raw Diff

Details

Reviewers

spatel
mzolotukhin
mkuper
hfinkel
RKSimon
filcab
ABataev
MatzeB
javed.absar
anton-afanasyev

Commits

rGe2358b53bc09: [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in…
rL313348: [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in…

Summary

Patch tries to improve vectorization of the following code:

void add1(int * __restrict dst, const int * __restrict src) {
  *dst++ = *src++;
  *dst++ = *src++ + 1;
  *dst++ = *src++ + 2;
  *dst++ = *src++ + 3;
}

Currently this code cannot be vectorized because the very first operation is not a binary add, but just a load.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

dtemirbulatov added inline comments.Jan 30 2018, 7:30 AM

test/Transforms/SLPVectorizer/X86/internal-dep.ll
10 ↗	(On Diff #131752)	ok, Thanks

dtemirbulatov added inline comments.Jan 30 2018, 7:52 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
3800	probably level 2 or 3 dependencies might be ok since it is not encapsulated in a single operation.

What is happening with this patch? It's been in development for over a year now and still seems to be having problems. Getting PR30787 fixed would be VERY useful and I'm thinking we should be looking at alternatives if this patch is going to carry on stalling/reverting.

I will update the solution shortly, I am currently in changing scheduler in order to avoid non-alternative opcodes in a bundle.

Removed internal vector dependency check as incorrect and I added the dependency check in case of partial bundle vectorization with non-alternative operations only. At BoUpSLP::vectorizeTree, BoUpSLP::buildTree.

ABataev added inline comments.Mar 29 2018, 6:34 AM

test/Transforms/SLPVectorizer/SystemZ/pr34619.ll
2 ↗	(On Diff #140208)	Commit this test as a separate NFC patch with the checks against trunk
test/Transforms/SLPVectorizer/X86/partail.ll
2 ↗	(On Diff #140208)	Commit this test as a separate NFC patch with the checks against trunk
test/Transforms/SLPVectorizer/X86/pr35497.ll
2 ↗	(On Diff #140208)	Commit this test as a separate NFC patch with the checks against trunk
test/Transforms/SLPVectorizer/X86/resched.ll
2 ↗	(On Diff #140208)	Commit this test as a separate NFC patch with the checks against trunk

Update after tests commit, formatting, delete Instruction::ExtractElement restricktion from tryToRepresentAsInstArg.

ABataev added inline comments.Mar 30 2018, 9:08 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
1283	Remove extra parens around `Entry->State.IsNonAlt` Why are you skipping bundles with non alternarive opcodes only?
2610–2614	Rewrite it this way: SmallPtrSet<Instruction *, 4> BundleInst; Bundle = Bundle->FirstInBundle; LastInst = Bundle->Inst; while (Bundle) { BundleInst.insert(Bundle->Inst); Bundle = Bundle->NextInBundle; }
2615–2617	Why do you need to scan all the instructions in the basic block starting from `First`? Why you can't use only scheduled instructions?
2618	Move `++Iter` to the header of `for` loop
3265	Name of the variable must start from capital letter.
3268	The code is not formatted Seems to me you missed `break;`

Update after Alexey's remarks.

dtemirbulatov added inline comments.Mar 31 2018, 8:10 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
1283	yes, we don't need to check Entry->State.IsNonAlt here, that is similar to getTreeEntry(U, Scalar) == Entry.
2610–2614	Done. Thanks.
2615–2617	What do you mean? please elaborate. If you mean ScheduleData here then it is also not always gives up correct sequence of scheduled instructions in a block. Like, bundle member with NextInBundle equals to null is not guaranty to be the last instruction of this bundle among other instructions. If we start with First then it is highly likely that we could iterate across all bundle members and exit instead of iterating to the end of the basic block. Also, I am thinking, to avoid this overhead we could note the last scheduled instruction in BB during scheduling in scheduleBlock() and keep this information in ScheduleData structure.
2618	Done.
3265	Done.
3268	Correct, Thanks.

Rebased, improved complexity of setInsertPointAfterBundle() for already scheduled instructions, minor changes in scheduleBlock()

Also, your code seems not quite formatted. Please, use clang-format on your changes to format it properly.

lib/Transforms/Vectorize/SLPVectorizer.cpp
257	Enclose it into braces
259	This too
270	bool IsNonAlt = llvm::one_of(VL, [Opcode, AltOpcode](Value *V) {return isa<Instruction>(V) && !sameOpcodeOrAlt(Opcode, AltOpcode, cast<Instruction>(V)->getOpcode());});
1366	Add the debug message here
2611	Why you can't put bundles in the list in the right order: from the very first instruction to the very last?
3267–3274	Better to do it this way: if (llvm::any_of(Scalar->users(), [this, Entry, Scalar](User *U){return !getTreeEntry(U) && getTreeEntry(U, Scalar) == Entry;})) continue;
3796	Remove this, it is not needed
3806	`pickedInst`->`PickedInst`
3810–3812	Remove it, does not do anything

Update after remarks from Alexey Bataev.

dtemirbulatov added inline comments.Apr 16 2018, 2:27 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp
2611	I could do this in scheduleBlock() function with a queue, but that could add additional complexity.

ABataev added inline comments.Apr 17 2018, 8:10 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
2611	Could you explain why?

dtemirbulatov added inline comments.Apr 22 2018, 6:19 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
2611	One instruction could belong to one or more separate bundles... and while we try to change order in bundles at scheduleBlock() we have to update ScheduleDataMap, ExtraScheduleDataMap.

dtemirbulatov added inline comments.Apr 23 2018, 4:09 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
2611	I mean pseudo operation could occur in more than one bundle.

ABataev added inline comments.Apr 23 2018, 6:24 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
2611	But these schedule bundles must have different scheduling region id and they must be in a different bundles, why their order changes?

Implement post scheduling bundle reorder of the last element of the bundle according how it was scheduled.

Herald added a reviewer: javed.absar. · View Herald TranscriptMay 22 2018, 12:18 AM

dtemirbulatov added inline comments.May 22 2018, 6:00 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp
2611	The bundle is differerent, but scheduling region id is the same.

dtemirbulatov added inline comments.May 22 2018, 6:27 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp

2611

I mean, for example, for this function:
define void @add0(i32* noalias %dst, i32* noalias %src) {
entry:

%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4
%add = add nsw i32 %0, 1
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
store i32 %add, i32* %dst, align 4
%incdec.ptr2 = getelementptr inbounds i32, i32* %src, i64 2
%1 = load i32, i32* %incdec.ptr, align 4
%add3 = add nsw i32 %1, 1
%incdec.ptr4 = getelementptr inbounds i32, i32* %dst, i64 2
store i32 %add3, i32* %incdec.ptr1, align 4
%incdec.ptr5 = getelementptr inbounds i32, i32* %src, i64 3
%2 = load i32, i32* %incdec.ptr2, align 4
%add6 = add nsw i32 %2, 2
%incdec.ptr7 = getelementptr inbounds i32, i32* %dst, i64 3
store i32 %add6, i32* %incdec.ptr4, align 4
%3 = load i32, i32* %incdec.ptr5, align 4
%add9 = add nsw i32 %3, 3
store i32 %add9, i32* %incdec.ptr7, align 4
ret void

}

We have two bundles:
[ %3 = load i32, i32* %src, align 4; %add3 = add nsw i32 %2, 1; %add6 = add nsw i32 %1, 2; %add9 = add nsw i32 %0, 3]
and
[ %3 = load i32, i32* %src, align 4; %2 = load i32, i32* %incdec.ptr, align 4; %1 = load i32, i32* %incdec.ptr2, align 4; %0 = load i32, i32* %incdec.ptr5, align 4]
with the same instruction %3 = load i32, i32* %src, align 4 and one is a pseudo instruction in this bundle [ %3 = load i32, i32* %src, align 4; %add3; %add6; %add9]
and all in the same scheduling region id that equal to 1.

dtemirbulatov added inline comments.May 22 2018, 6:34 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp
2611	why their order changes? sometimes we have to reschedule a pseudo instruction first in both bundles in order to form correct dependencies.

@dtemirbulatov @ABataev Is there anything that can be done to please keep this patch moving? The next release branch won't be far away now and I originally requested this over a year and a half ago (PR30787), and ideally I'd like to get this in and then I can more easily implement PR33744 as well in time.

I notice there's still some minor refactoring in the patch to use InstructionState in newTreeEntry/tryScheduleBundle calls - is it worth getting that done quickly first?

What about testing - how much testing has been done with external code?

In D28907#1120098, @RKSimon wrote:

@dtemirbulatov @ABataev Is there anything that can be done to please keep this patch moving? The next release branch won't be far away now and I originally requested this over a year and a half ago (PR30787), and ideally I'd like to get this in and then I can more easily implement PR33744 as well in time.

I think that algorithm at 4512 is incorrect, we need to copy common instructions for bundles, instead of reordering bundles. Alexey?

In D28907#1120098, @RKSimon wrote:

@dtemirbulatov @ABataev Is there anything that can be done to please keep this patch moving? The next release branch won't be far away now and I originally requested this over a year and a half ago (PR30787), and ideally I'd like to get this in and then I can more easily implement PR33744 as well in time.

I notice there's still some minor refactoring in the patch to use InstructionState in newTreeEntry/tryScheduleBundle calls - is it worth getting that done quickly first?

What about testing - how much testing has been done with external code?

Hi guys, sorry, but I cannot review this patch currently as I'm on the vacation for 3 more weeks. It would be good if somebody else could look at this patch. Try to ask Ayal Zaks (@Ayal), maybe he could help you. As to the implementation itself, I don't like the way it is implemented now. Current scheduling implementation looks like a crutch or a hack. It is very complex, has some unclear logic that may be broken in many ways.

Vasilis added a subscriber: Vasilis.Jun 5 2018, 3:02 PM

Vasilis removed a subscriber: Vasilis.Jun 5 2018, 3:04 PM

vporpo added a subscriber: vporpo.Jun 5 2018, 3:16 PM

RKSimon mentioned this in D48120: [SLPVectorizer] Remove RawInstructionsData/getMainOpcode and merge into getSameOpcode.Jun 13 2018, 6:14 AM

RKSimon mentioned this in rL334701: [SLPVectorizer] Remove RawInstructionsData/getMainOpcode and merge into….Jun 14 2018, 3:30 AM

RKSimon mentioned this in D48359: [SLPVectorizer] Use InstructionsState to record AltOpcode.Jun 20 2018, 4:18 AM

RKSimon mentioned this in rL335134: [SLPVectorizer] Use InstructionsState to record AltOpcode.Jun 20 2018, 8:18 AM

Hi, when do you plan to update the path for the latest version? I have an idea of improving the scheduling model, maybe it will help us to resolve the problems with the patch.

In D28907#1149639, @ABataev wrote:

Hi, when do you plan to update the path for the latest version? I have an idea of improving the scheduling model, maybe it will help us to resolve the problems with the patch.

Hi, I think I have one issue remaining with LNT's MultiSource/Benchmarks/Olden/bh/bh, I will rebased the current version today, tomorrow. Thanks.

@dtemirbulatov @ABataev I'm not sure how much this will cross over but I'm investigating how to extend the alternate opcode mechanism to work with non-binary instructions. Initially I'm just looking at cast + call operators (e.g. sitofp/uitofp, floor/ceil etc.) but even that requires moderate refactoring of the getEntryCost and vectorizeTree ShuffleVector handling. What I'm not sure of is whether we could extend this idea to accept any partial vectorization and the alternate becomes a pass through - what do you think?

If not there is scope to further simplify this patch, for instance @spatel's work in InstCombine for PR37806 should mean that you can rely on InstCombine to perform much of the work in getDefaultConstantForOpcode etc. and all SLP needs to do is create a "passthrough" SK_Select shuffle stage.

In D28907#1149688, @RKSimon wrote:

If not there is scope to further simplify this patch, for instance @spatel's work in InstCombine for PR37806 should mean that you can rely on InstCombine to perform much of the work in getDefaultConstantForOpcode etc. and all SLP needs to do is create a "passthrough" SK_Select shuffle stage.

For reference in D48830, I'm proposing that we use (and extend) ConstantExpr::getBinOpIdentity(). IIUC, that would be the same thing that is shown as getDefaultConstantForOpcode() here.

In D28907#1149688, @RKSimon wrote:

@dtemirbulatov @ABataev I'm not sure how much this will cross over but I'm investigating how to extend the alternate opcode mechanism to work with non-binary instructions. Initially I'm just looking at cast + call operators (e.g. sitofp/uitofp, floor/ceil etc.) but even that requires moderate refactoring of the getEntryCost and vectorizeTree ShuffleVector handling. What I'm not sure of is whether we could extend this idea to accept any partial vectorization and the alternate becomes a pass through - what do you think?

If not there is scope to further simplify this patch, for instance @spatel's work in InstCombine for PR37806 should mean that you can rely on InstCombine to perform much of the work in getDefaultConstantForOpcode etc. and all SLP needs to do is create a "passthrough" SK_Select shuffle stage.

I think this is possible if I correctly understood your idea. But we need to make this patch land at first. To do so we need to resolve the problems with the scheduling.

In D28907#1149688, @RKSimon wrote:

If not there is scope to further simplify this patch, for instance @spatel's work in InstCombine for PR37806 should mean that you can rely on InstCombine to perform much of the work in getDefaultConstantForOpcode etc. and all SLP needs to do is create a "passthrough" SK_Select shuffle stage.

Yes, this is also a possible solution. What we need to do in this case is to tweak the cost model + improve gathering algorithm. Yes, that might work.

In D28907#1149700, @spatel wrote:

In D28907#1149688, @RKSimon wrote:

If not there is scope to further simplify this patch, for instance @spatel's work in InstCombine for PR37806 should mean that you can rely on InstCombine to perform much of the work in getDefaultConstantForOpcode etc. and all SLP needs to do is create a "passthrough" SK_Select shuffle stage.

For reference in D48830, I'm proposing that we use (and extend) ConstantExpr::getBinOpIdentity(). IIUC, that would be the same thing that is shown as getDefaultConstantForOpcode() here.

Yes, I see, we can try to use it. AT least we need to think about this solution, need to estimate all pros/cons here

here is change before 74cd05c4a4f94a27daf2d3fc173e7213060cc47c commit, I am currently rebaseing the change.

dtemirbulatov added inline comments.Jul 2 2018, 12:17 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp
450	we don't need to do this.

dtemirbulatov added inline comments.Jul 2 2018, 12:22 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp
2611	I will move this solution to a dedicated function, so we don't have to measure distance here.

OK - please shout if there is anything I can do to help - I realise my alternate opcode refactoring has made rebasing this patch less straightforward!

spatel mentioned this in rL336215: [Constants] add identity constants for fadd/fmul.Jul 3 2018, 10:17 AM

spatel mentioned this in D48893: [Constants, InstCombine] allow RHS (operand 1) identity constants for binops.Jul 3 2018, 12:41 PM

spatel mentioned this in rL336444: [Constants] extend getBinOpIdentity(); NFC.Jul 6 2018, 8:23 AM

RKSimon mentioned this in D49135: [SLPVectorizer] Add initial alternate opcode support for cast instructions..Jul 10 2018, 7:22 AM

RKSimon mentioned this in rL336804: [SLPVectorizer] Add initial alternate opcode support for cast instructions..Jul 11 2018, 6:39 AM

RKSimon mentioned this in rL336812: [SLPVectorizer] Add initial alternate opcode support for cast instructions..Jul 11 2018, 8:10 AM

RKSimon mentioned this in D49225: [SLPVectorizer] Move scalar/vector costs to helper functions (NFCI)..Jul 12 2018, 2:40 AM

RKSimon mentioned this in rL336989: [SLPVectorizer] Add initial alternate opcode support for cast instructions..Jul 13 2018, 4:14 AM

Rebase. Still, I need to implement bundle reordering instead of the change in setInsertPointAfterBundle(), fully tested this change. I will split this change in several independent reviews.

Fixed one more issue with duplicating memory dependencies in calculateDependencies() by checking whether we already counted this particular dependency.

RKSimon added inline comments.Jul 31 2018, 4:07 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
311	InstructionState was keeping the Base/Alt instructions the same if AltOpcodeNum ==0 (BaseIndex ==AltIndex) - why are you inserting nulls?
1293	Can we use ConstantExpr::getBinOpIdentity instead?

dtemirbulatov added inline comments.Jul 31 2018, 5:41 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
311	yes, correct, Thanks.
1293	no, ConstantExpr::getBinOpIdentity does support only commutative operations.

lebedev.ri added a subscriber: lebedev.ri.Jul 31 2018, 5:44 AM

lebedev.ri added inline comments.

lib/Transforms/Vectorize/SLPVectorizer.cpp
1293	Are you sure? Constant ConstantExpr::getBinOpIdentity(unsigned Opcode, Type Ty, bool AllowRHSConstant) { ... // Non-commutative opcodes: AllowRHSConstant must be set. if (!AllowRHSConstant) return nullptr;

dtemirbulatov added inline comments.Jul 31 2018, 6:01 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
1293	oh, yes, sorry, it just a different behaviour here. we want for example 1 division and ConstantExpr::getBinOpIdentity would return 0.

Rebase, Fixed RKSimon's remark, implemented bundle reordering function.

Add hash code based indexing instead of instruction based, split ScheduleData into InstScheduleData, PseudoScheduleData, improve diagnostics of bundle scheduling.

Why you don't want to use pair<Instruction*, Opcode> as the key in all maps/sets? I expect that it will lead to much more simpler slution.

lib/Transforms/Vectorize/SLPVectorizer.cpp
182	Instead, I would check for the supported instructions rather than for the unsupported ones.

ABataev added inline comments.Sep 28 2018, 1:14 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp
271	Not sure if this always produces unique hashes, might lead to the incorrect compiler work.

Implemented pair<Value*, Opcode> as the key in all maps/sets.
Fixed issue with incorrect memory dependency that is attached in testcase memory-dep.ll.
Allow Non-alterative operations to be stored in InstScheduleDataMap.
Removed IsNonAlt field out of InstructionsState.

dtemirbulatov added inline comments.Oct 10 2018, 3:46 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp
182	The other list is a bit long in my implementation, this one looks better.

Update after I found another couple of errors after additional testing the change. Here are changes:
Removed OpValue field out of PseudoScheduleData.
Forbid any bundles with non-alternative operations and remainder operation, see rem-bundle.ll.
Fixed error in setInsertPointAfterBundle() function by using getScheduleData() instead of getInstrScheduleData and if a bundle member is present multiple bundles at the same time then walk through the bundle to find the last scheduled member of the bundle. see insert-after-multiple-bundle.ll
Restore MemoryDependencies to SmallVector, we don't have to count a member presents in calculateDependencies().

Implemented Map<Instruction*, std::pair<Value *Parent, unsigned Opcode> indexing for ScalarToTreeEntry, PseudoInstScheduleDataMap.
Added reorderBundles() function to reorder bundles that have common instructions and common instructions were scheduled at least twice. We don't want to note which bundle was scheduled first. We could determine instruction layout after SLP scheduling.

RKSimon added inline comments.Nov 5 2018, 9:42 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
178	Repeated Instruction::SRem

RKSimon added inline comments.Nov 5 2018, 9:46 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
1350	@spatel @dtemirbulatov Can we use getBinOpIdentity yet ?

ABataev added inline comments.Nov 5 2018, 10:04 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
182	I still think it is better to limit this with the list of supported operations rather than the list of the unsupported ones.
905	Very strange that you still need particular scheduling class for the pseudo instructions, I think you can use the original data structure if you correctly implement the pseudo instruction itself. I still don't see all the changes we talked about.

spatel added inline comments.Nov 5 2018, 2:27 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp
1350	Yes - if anyone has suggestions for making that 'AllowRHSConstant' param clearer, let me know. The only problem that I see is that this code is returning +0.0 as the default constant for an fadd (because the caller guarantees 'nsz'?). I worked around something like that here: rL346143 I can't tell if SLP would want to do something like that.

dtemirbulatov added inline comments.Nov 7 2018, 8:11 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
178	oh, thanks. I missed that.
182	ok, I will change that.
905	ok, I am implementing now.
1350	ok, yes, looks like it is going to work.

Looks like I fixed all previous remarks also during testing I found two more issues with the change and I fixed both.

dtemirbulatov added inline comments.Nov 14 2018, 10:22 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
3415	We have to clear all dependancy calculations since the pseudo instruction might use already calculated SD node with calculated dependency, look at scheduling_pseudo.ll testcase.
3796	Please check schedule-bundle1.ll testcase, without this change scheduling is not correct.

Oops, I found a few typos, Formated tryToRepresentAsInstArg() and removed the second SRem from isRemainder().

ABataev added inline comments.Nov 14 2018, 10:44 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
178	Still 2 `SRem`s
269–285	Restore the original function here.
270	You definitely need comments here
593–594	Instead of `ArrayRef<Value *>` I expected to see something like `ArrayRef<InstructionOrPseudoInstruction>`, where `InstructionOrPseudoInstruction` is the class that represents the instruction/Value itself, or the pseudo instruction.
620–624	Why do you need this?
638	Again, you need all this code because you did not implemented what we discussed. Try to use the `InstructionOrPseudoInstruction` like class to represent values/instructions and pseudo-instructions. It should be much easier to implement and a lot of changes will just go away.

@dtemirbulatov Any movement on this? It'd be great to get this in for the 8.0 release!

In D28907#1318422, @RKSimon wrote:

@dtemirbulatov Any movement on this? It'd be great to get this in for the 8.0 release!

yes, I just refactored getTreeEntry and ... according to ABataev request, I will update after additional testing in a day or two.

Introduced InstructionOrPseudo structure and removed "Instruction::Invoke" out of tryToRepresentAsInstArg() function with error example in invoke.ll testcase.

lebedev.ri added inline comments.Dec 7 2018, 6:58 AM

test/Transforms/SLPVectorizer/X86/invoke.ll
3 ↗	(On Diff #177207)	Unless this currently crashes, precommit?

dtemirbulatov marked an inline comment as done.Dec 7 2018, 7:06 AM

dtemirbulatov added inline comments.

test/Transforms/SLPVectorizer/X86/invoke.ll
3 ↗	(On Diff #177207)	yes, thanks, I will remove this line.

dtemirbulatov marked an inline comment as done.Dec 7 2018, 7:23 AM

dtemirbulatov added inline comments.

test/Transforms/SLPVectorizer/X86/invoke.ll
3 ↗	(On Diff #177207)	Oh, No crash currently on pre-commit, I will remove the whole test.

lebedev.ri added inline comments.Dec 7 2018, 7:26 AM

test/Transforms/SLPVectorizer/X86/invoke.ll
3 ↗	(On Diff #177207)	What i have meant to say is, unless this test file currently causes a crash, it would be best to commit it now, thus the diff will show the diff, and not a whole new test file.

dtemirbulatov marked an inline comment as done.Dec 7 2018, 7:44 AM

dtemirbulatov added inline comments.

test/Transforms/SLPVectorizer/X86/invoke.ll
3 ↗	(On Diff #177207)	Ok

ABataev added inline comments.Dec 7 2018, 8:41 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
175–263	Turn this to a class. Also, not enough comments, the design of the structure also should be improved. Currently, it is too complex. Maybe base class + templates will help.
269–285	I think this function also should accept `ArrayRef<InstructionOrPseudo *>` as an input param
272	Please, follow the coding standard in the new code. Also, some of the functions can be replaced by some standard functions, just like in this case it can be replaced by `llvm::all_of`
288	I think, the pseudo instruction is always can be considered as the one with the same block, because you can put it easily into the current block
555	Left and Right must be `SmallVectorImpl<InstructionOrPseudo *> &`?
588	Restore the original
638	Also, seems to me this must be an `InstructionOrPseudoOp`
770	Why do you need this new function? No comments and explanation.
816	Pseudo instruction must be just ignored?
923	Again, I think it must be `InstructionOrPseudoOp`
992–1004	Again, `InstructionOrPseudoOp`
1046	If you implement `InstructionOrPseudoOp` correctly, this reorder stuff should not be required, I think
1061	Why you cannot store everything in a single map?

RKSimon added a reviewer: anton-afanasyev.Dec 27 2018, 7:20 AM

dtemirbulatov marked an inline comment as done.Dec 28 2018, 6:29 AM

dtemirbulatov added inline comments.

lib/Transforms/Vectorize/SLPVectorizer.cpp
1046	well, There should be several(>=2) independent scheduling events(one for real instruction and other for pseudos) and there is just one real instruction, in the end, I don't see how it could be done without reordering or tracking the last scheduled instance for the same instruction. We could introduce something like IsLastScheduled field in ScheduleData struct, but it would be quite similar to reordering.

ABataev added inline comments.Dec 28 2018, 6:39 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
1046	If you add the real instruction instead of this pseudoinstruction, will you need all these scheduling events? No. Will you need to do some extra reordering etc.? No. Why you cannot simulate it with the new class/structure?

dtemirbulatov marked an inline comment as done.Dec 28 2018, 7:20 AM

dtemirbulatov added inline comments.

lib/Transforms/Vectorize/SLPVectorizer.cpp
1046	do you mean that the last one scheduling becomes the real one and we just ignore any pseudos?

dtemirbulatov marked an inline comment as done.Dec 28 2018, 7:22 AM

dtemirbulatov added inline comments.

lib/Transforms/Vectorize/SLPVectorizer.cpp
1046	do you mean that the last one scheduled becomes the real one and we just ignore any pseudos?

ABataev added inline comments.Dec 28 2018, 7:24 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
1046	No. I mean, that if you insert the real instructions instead of those pseudo-instructions, you won't need all that reordering/new scheduling etc. Why can't you mimic this behavior with the pseudo-instruction?

dtemirbulatov marked an inline comment as done.Dec 28 2018, 10:22 AM

dtemirbulatov added inline comments.

lib/Transforms/Vectorize/SLPVectorizer.cpp
1046	hmm, There are at least NextLoadStore dependancies that we break if we, for example, insert real Load instruction somewhere. Or with could recalculate NextLoadStore. Or maybe mimic pseudo in some another way.

ABataev added inline comments.Dec 28 2018, 10:24 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
1046	I don't think that they can be broken as we're not going to insert new Loads/Stores, just some binops. SO, the loads/stores and the the corresponding dependencies should not be affected.

removed bundle reordering by replacing pseudo instructions with real ones.

dtemirbulatov marked 12 inline comments as done.Jan 18 2019, 12:19 PM

dtemirbulatov added inline comments.

lib/Transforms/Vectorize/SLPVectorizer.cpp
175–263	I see this class as just one, BTW it is a container.
288	I have this functionality, but let us for now minimize functionality change here, I will follow this change right after that check-in.
555	That change further increases the size of this review, we can change that later.
638	no, looks like "Instruction" here is more convenient.

dtemirbulatov mentioned this in D57409: [SLP] Allow to duplicate instruction in multiple bundles by introducing pseudo operations..Jan 29 2019, 12:10 PM

This revision was not accepted when it landed; it landed in state Needs Review.Oct 7 2019, 5:32 AM

Closed by commit rGe2358b53bc09: [SLPVectorizer] Failure to beneficially vectorize 'copyable' elements in… (authored by dtemirbulatov). · Explain Why

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2019, 5:32 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

reopening - phab seems to be a bit broken

RKSimon requested changes to this revision.Oct 7 2019, 6:07 AM

This revision now requires changes to proceed.Oct 7 2019, 6:07 AM

Reping

In D28907#2278838, @xbolva00 wrote:

Reping

In process of rebasing and simplifying proposed change.

xbolva00 mentioned this in D102748: [LoopUnroll] Don't unroll before vectorisation.May 19 2021, 2:00 AM

Hello, this work was brought to my attention in D102748. Would be good to get this in, and seems almost finished too. Any plans to pick this up? Otherwise I think I would be interested in doing that.

In D28907#2770982, @SjoerdMeijer wrote:

Hello, this work was brought to my attention in D102748. Would be good to get this in, and seems almost finished too. Any plans to pick this up? Otherwise I think I would be interested in doing that.

This should be much easier to implement after non-power-2 in SLP support.

In D28907#2770984, @ABataev wrote:

In D28907#2770982, @SjoerdMeijer wrote:

Hello, this work was brought to my attention in D102748. Would be good to get this in, and seems almost finished too. Any plans to pick this up? Otherwise I think I would be interested in doing that.

This should be much easier to implement after non-power-2 in SLP support.

Could you elaborate a bit more on this just for my understanding? E.g., what is this exactly, and is this an ongoing effort in the SLP vectoriser?

In D28907#2770989, @SjoerdMeijer wrote:

In D28907#2770984, @ABataev wrote:

In D28907#2770982, @SjoerdMeijer wrote:

Hello, this work was brought to my attention in D102748. Would be good to get this in, and seems almost finished too. Any plans to pick this up? Otherwise I think I would be interested in doing that.

This should be much easier to implement after non-power-2 in SLP support.

Could you elaborate a bit more on this just for my understanding? E.g., what is this exactly, and is this an ongoing effort in the SLP vectoriser?

Here is D57059. I'm splitting it into smaller patches and committing them step-by-step.

Thanks, interesting, having a look there!

Matt added a subscriber: Matt.May 20 2021, 8:07 AM

In D28907#2770984, @ABataev wrote:

In D28907#2770982, @SjoerdMeijer wrote:

Hello, this work was brought to my attention in D102748. Would be good to get this in, and seems almost finished too. Any plans to pick this up? Otherwise I think I would be interested in doing that.

This should be much easier to implement after non-power-2 in SLP support.

Can you explain why?

x1 + 1
x2
x3 + 6
x4 + 8

How that patch will help you? You need to add “fake” noop addition “ + 0” for x2.

In D28907#2775629, @xbolva00 wrote:

In D28907#2770984, @ABataev wrote:

In D28907#2770982, @SjoerdMeijer wrote:

Hello, this work was brought to my attention in D102748. Would be good to get this in, and seems almost finished too. Any plans to pick this up? Otherwise I think I would be interested in doing that.

This should be much easier to implement after non-power-2 in SLP support.

Can you explain why?

x1 + 1
x2
x3 + 6
x4 + 8

How that patch will help you? You need to add “fake” noop addition “ + 0” for x2.

It is pretty similar to non-pow-2 functionality, just instead of undefs need to use 0 for adds, 1 for mul etc. I can't provide more details currently but I'm going to revive it after non-pow-2 patch. I assume we won't need to change scheduling model.

Ah right:) I understand now.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

945 lines

test/

Transforms/

SLPVectorizer/

X86/

vect_copyable_in_binops.ll

200 lines

Diff 107683

lib/Transforms/Vectorize/SLPVectorizer.cpp

Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	case Instruction::FAdd:
return Instruction::FSub;		return Instruction::FSub;
case Instruction::FSub:		case Instruction::FSub:
return Instruction::FAdd;		return Instruction::FAdd;
case Instruction::Add:		case Instruction::Add:
return Instruction::Sub;		return Instruction::Sub;
case Instruction::Sub:		case Instruction::Sub:
return Instruction::Add;		return Instruction::Add;
default:		default:
return 0;		return Op;
}		}
}		}

/// true if the \p Value is odd, false otherwise.
static bool isOdd(unsigned Value) {		static bool isOdd(unsigned Value) {
return Value & 1;		return Value & 1;
}		}
		RKSimonUnsubmitted Not Done Reply Inline Actions Repeated Instruction::SRem RKSimon: Repeated Instruction::SRem
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions oh, thanks. I missed that. dtemirbulatov: oh, thanks. I missed that.
		ABataevUnsubmitted Not Done Reply Inline Actions Still 2 `SRem`s ABataev: Still 2 `SRem`s

///\returns bool representing if Opcode \p Op can be part		static bool sameOpcodeOrAlt(unsigned Opcode, unsigned AltOpcode,
/// of an alternate sequence which can later be merged as		unsigned CheckedOpcode) {
/// a ShuffleVector instruction.		return Opcode == CheckedOpcode \|\| AltOpcode == CheckedOpcode;
		filcabUnsubmitted Not Done Reply Inline Actions No need to document this one. Simple, explanatory name, and static filcab: No need to document this one. Simple, explanatory name, and static
		ABataevUnsubmitted Not Done Reply Inline Actions Instead, I would check for the supported instructions rather than for the unsupported ones. ABataev: Instead, I would check for the supported instructions rather than for the unsupported ones.
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions The other list is a bit long in my implementation, this one looks better. dtemirbulatov: The other list is a bit long in my implementation, this one looks better.
		ABataevUnsubmitted Not Done Reply Inline Actions I still think it is better to limit this with the list of supported operations rather than the list of the unsupported ones. ABataev: I still think it is better to limit this with the list of supported operations rather than the…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions ok, I will change that. dtemirbulatov: ok, I will change that.
static bool canCombineAsAltInst(unsigned Op) {
return Op == Instruction::FAdd \|\| Op == Instruction::FSub \|\|
Op == Instruction::Sub \|\| Op == Instruction::Add;
}		}

/// \returns ShuffleVector instruction if instructions in \p VL have		/// Chooses the correct key for scheduling data. If \p Op has the same (or
/// alternate fadd,fsub / fsub,fadd/add,sub/sub,add sequence.		/// alternate) opcode as \p OpValue, the key is \p Op. Otherwise the key is \p
		filcabUnsubmitted Not Done Reply Inline Actions Nit: I'd rather have `CheckedOpcode` come first. And probably call it "is <something>". Like `isOneOf` or similar (generic name is ok if we only need one function like this). Should be more readable. filcab: Nit: I'd rather have `CheckedOpcode` come first. And probably call it "is <something>". Like…
/// (i.e. e.g. opcodes of fadd,fsub,fadd,fsub...)		/// OpValue.
static unsigned isAltInst(ArrayRef<Value *> VL) {		static Value isOneOf(Value OpValue, Value *Op) {
Instruction *I0 = dyn_cast<Instruction>(VL[0]);		auto *I = dyn_cast<Instruction>(Op);
		if (!I)
		return OpValue;
		auto *OpInst = cast<Instruction>(OpValue);
		unsigned OpInstOpcode = OpInst->getOpcode();
		unsigned IOpcode = I->getOpcode();
		if (sameOpcodeOrAlt(OpInstOpcode, getAltOpcode(OpInstOpcode), IOpcode))
		return Op;
		return OpValue;
		}

		/// Checks if the \p Opcode can be considered as an operand of a (possibly)
		/// binary operation \p I.
		/// \returns The code of the binary operation of instruction \p I if the
		/// instruction with \p Opcode can be considered as an operand of \p I with the
		/// default value.
		static unsigned tryToRepresentAsInstArg(unsigned Opcode, Instruction *I) {
		assert(!sameOpcodeOrAlt(Opcode, getAltOpcode(Opcode), I->getOpcode())
		&& "Invalid Opcode");
		if (Opcode != Instruction::PHI &&
		(I->getType()->isIntegerTy() \|\| I->hasUnsafeAlgebra()) &&
		isa<BinaryOperator>(I))
		return I->getOpcode();
		return 0;
		RKSimonUnsubmitted Not Done Reply Inline Actions Always add an assert message. RKSimon: Always add an assert message.
		}

		namespace {
		/// Contains data for the instructions going to be vectorized.
		struct RawInstructionsData {
		/// Main Opcode of the instructions going to be vectorized.
		unsigned Opcode = 0;
		/// Position of the first instruction with the \a Opcode.
		unsigned OpcodePos = 0;
		/// Need an additional analysis (if at least one of the instruction is not
		/// same instruction kind as an instruction at OpcodePos position in the
		/// list).
		bool NeedAnalysis = false;
		/// The list of instructions have some instructions with alternate opcodes.
		bool HasAltOpcodes = false;
		};
		} // namespace

		/// Checks the list of the vectorized instructions \p VL and returns info about
		/// this list.
		static RawInstructionsData getMainOpcode(ArrayRef<Value *> VL) {
		auto *I0 = dyn_cast<Instruction>(VL[0]);
		if (!I0)
		return {};
		RawInstructionsData Res;
unsigned Opcode = I0->getOpcode();		unsigned Opcode = I0->getOpcode();
unsigned AltOpcode = getAltOpcode(Opcode);		unsigned AltOpcode = getAltOpcode(Opcode);
		RKSimonUnsubmitted Not Done Reply Inline Actions You can probably tidy this up by doing an early-out: if (sameOpcodeOrAlt(Opcode, AltOpcode, I->getOpcode())) continue; RKSimon: You can probably tidy this up by doing an early-out: ``` if (sameOpcodeOrAlt(Opcode, AltOpcode…
for (int i = 1, e = VL.size(); i < e; i++) {		unsigned NewOpcodePos = 0;
Instruction *I = dyn_cast<Instruction>(VL[i]);		for (unsigned Cnt = 0, E = VL.size(); Cnt != E; ++Cnt) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Comments here on what is being attempted in the loop RKSimon: Comments here on what is being attempted in the loop
if (!I \|\| I->getOpcode() != (isOdd(i) ? AltOpcode : Opcode))		auto *I = dyn_cast<Instruction>(VL[Cnt]);
return 0;		if (!I)
		return {};
		if (!sameOpcodeOrAlt(Opcode, AltOpcode, I->getOpcode())) {
		if (unsigned NewOpcode = tryToRepresentAsInstArg(Opcode, I)) {
		if (!Instruction::isBinaryOp(Opcode) \|\|
		!Instruction::isCommutative(Opcode)) {
		NewOpcodePos = Cnt;
		Opcode = NewOpcode;
		AltOpcode = getAltOpcode(Opcode);
		Res.NeedAnalysis = true;
		}
		} else if (tryToRepresentAsInstArg(I->getOpcode(),
		cast<Instruction>(VL[NewOpcodePos])))
		Res.NeedAnalysis = true;
		else
		ABataevUnsubmitted Not Done Reply Inline Actions Enclose it into braces ABataev: Enclose it into braces
		return {};
		} else if (Opcode != I->getOpcode()) {
		ABataevUnsubmitted Not Done Reply Inline Actions This too ABataev: This too
		Res.HasAltOpcodes = true;
		if (Res.NeedAnalysis && isOdd(NewOpcodePos))
		std::swap(Opcode, AltOpcode);
		}
		ABataevUnsubmitted Not Done Reply Inline Actions Turn this to a class. Also, not enough comments, the design of the structure also should be improved. Currently, it is too complex. Maybe base class + templates will help. ABataev: Turn this to a class. Also, not enough comments, the design of the structure also should be…
		dtemirbulatovAuthorUnsubmitted Done Reply Inline Actions I see this class as just one, BTW it is a container. dtemirbulatov: I see this class as just one, BTW it is a container.
}		}
return Instruction::ShuffleVector;		Res.Opcode = Opcode;
		Res.OpcodePos = NewOpcodePos;
		return Res;
}		}

		namespace {
		ABataevUnsubmitted Not Done Reply Inline Actions bool IsNonAlt = llvm::one_of(VL, [Opcode, AltOpcode](Value V) {return isa<Instruction>(V) && !sameOpcodeOrAlt(Opcode, AltOpcode, cast<Instruction>(V)->getOpcode());}); ABataev:* ``` bool IsNonAlt = llvm::one_of(VL, [Opcode, AltOpcode](Value *V) {return isa<Instruction>(V)…
		ABataevUnsubmitted Not Done Reply Inline Actions You definitely need comments here ABataev: You definitely need comments here
		/// Main data required for vectorization of instructions.
		ABataevUnsubmitted Not Done Reply Inline Actions Not sure if this always produces unique hashes, might lead to the incorrect compiler work. ABataev: Not sure if this always produces unique hashes, might lead to the incorrect compiler work.
		struct InstructionsState {
		ABataevUnsubmitted Done Reply Inline Actions Please, follow the coding standard in the new code. Also, some of the functions can be replaced by some standard functions, just like in this case it can be replaced by `llvm::all_of` ABataev: Please, follow the coding standard in the new code. Also, some of the functions can be replaced…
		/// The very first instruction in the list with the main opcode.
		Value *OpValue = nullptr;
		/// The main opcode for the list of instructions.
		unsigned Opcode = 0;
		filcabUnsubmitted Not Done Reply Inline Actions Put the pointer first, to minimize padding. Feel free to keep the constructor's parameter list in the current order if it makes more sense, of course. filcab: Put the pointer first, to minimize padding. Feel free to keep the constructor's parameter list…
		/// Some of the instructions in the list have alternate opcodes.
		bool IsAltShuffle = false;
		InstructionsState() = default;
		InstructionsState(Value *OpValue, unsigned Opcode, bool IsAltShuffle)
		: OpValue(OpValue), Opcode(Opcode), IsAltShuffle(IsAltShuffle) {}
		};
		} // namespace

/// \returns The opcode if all of the Instructions in \p VL have the same		/// \returns The opcode if all of the Instructions in \p VL have the same
		ABataevUnsubmitted Not Done Reply Inline Actions Restore the original function here. ABataev: Restore the original function here.
		ABataevUnsubmitted Done Reply Inline Actions I think this function also should accept `ArrayRef<InstructionOrPseudo >` as an input param ABataev:* I think this function also should accept `ArrayRef<InstructionOrPseudo *>` as an input param
/// opcode, or zero.		/// opcode, or zero.
static unsigned getSameOpcode(ArrayRef<Value *> VL) {		static InstructionsState getSameOpcode(ArrayRef<Value *> VL) {
		RKSimonUnsubmitted Not Done Reply Inline Actions This comment needs rewriting as it now returns an InstructionState and you needs to explain all the new cases in which it returns. RKSimon: This comment needs rewriting as it now returns an InstructionState and you needs to explain all…
Instruction *I0 = dyn_cast<Instruction>(VL[0]);		auto Res = getMainOpcode(VL);
		ABataevUnsubmitted Not Done Reply Inline Actions I think, the pseudo instruction is always can be considered as the one with the same block, because you can put it easily into the current block ABataev: I think, the pseudo instruction is always can be considered as the one with the same block…
		dtemirbulatovAuthorUnsubmitted Done Reply Inline Actions I have this functionality, but let us for now minimize functionality change here, I will follow this change right after that check-in. dtemirbulatov: I have this functionality, but let us for now minimize functionality change here, I will follow…
if (!I0)		unsigned Opcode = Res.Opcode;
return 0;		if (!Res.NeedAnalysis && !Res.HasAltOpcodes)
unsigned Opcode = I0->getOpcode();		return InstructionsState(VL[Res.OpcodePos], Opcode, false);
		RKSimonUnsubmitted Not Done Reply Inline Actions For clarity, use the InstructionsState constructor not an initializer. RKSimon: For clarity, use the InstructionsState constructor not an initializer.
for (int i = 1, e = VL.size(); i < e; i++) {		auto *OpInst = cast<Instruction>(VL[Res.OpcodePos]);
Instruction *I = dyn_cast<Instruction>(VL[i]);		unsigned AltOpcode = getAltOpcode(Opcode);
if (!I \|\| Opcode != I->getOpcode()) {		for (int Cnt = 0, E = VL.size(); Cnt < E; Cnt++) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Comments here on what is being attempted in the loop RKSimon: Comments here on what is being attempted in the loop
if (canCombineAsAltInst(Opcode) && i == 1)		auto *I = cast<Instruction>(VL[Cnt]);
return isAltInst(VL);		unsigned InstOpcode = I->getOpcode();
return 0;		if (Res.NeedAnalysis && !sameOpcodeOrAlt(Opcode, AltOpcode, InstOpcode))
		if (tryToRepresentAsInstArg(InstOpcode, OpInst))
		InstOpcode = (Res.HasAltOpcodes && isOdd(Cnt)) ? AltOpcode : Opcode;
		if ((Res.HasAltOpcodes &&
		InstOpcode != (isOdd(Cnt) ? AltOpcode : Opcode)) \|\|
		(!Res.HasAltOpcodes && InstOpcode != Opcode)) {
		return InstructionsState(OpInst, 0, false);
		RKSimonUnsubmitted Not Done Reply Inline Actions For clarity, use the InstructionsState constructor not an initializer. RKSimon: For clarity, use the InstructionsState constructor not an initializer.
}		}
}		}
return Opcode;		return InstructionsState(OpInst, Opcode, Res.HasAltOpcodes);
		RKSimonUnsubmitted Not Done Reply Inline Actions For clarity, use the InstructionsState constructor not an initializer. RKSimon: For clarity, use the InstructionsState constructor not an initializer.
}		}

/// \returns true if all of the values in \p VL have the same type or false		/// \returns true if all of the values in \p VL have the same type or false
/// otherwise.		/// otherwise.
static bool allSameType(ArrayRef<Value *> VL) {		static bool allSameType(ArrayRef<Value *> VL) {
		RKSimonUnsubmitted Not Done Reply Inline Actions InstructionState was keeping the Base/Alt instructions the same if AltOpcodeNum ==0 (BaseIndex ==AltIndex) - why are you inserting nulls? RKSimon: InstructionState was keeping the Base/Alt instructions the same if AltOpcodeNum ==0 (BaseIndex…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions yes, correct, Thanks. dtemirbulatov: yes, correct, Thanks.
Type *Ty = VL[0]->getType();		Type *Ty = VL[0]->getType();
for (int i = 1, e = VL.size(); i < e; i++)		for (int i = 1, e = VL.size(); i < e; i++)
if (VL[i]->getType() != Ty)		if (VL[i]->getType() != Ty)
return false;		return false;

return true;		return true;
}		}

/// \returns True if Extract{Value,Element} instruction extracts element Idx.		/// \returns True if Extract{Value,Element} instruction extracts element Idx.
static bool matchExtractIndex(Instruction *E, unsigned Idx, unsigned Opcode) {		static bool matchExtractIndex(Instruction *E, unsigned Idx, unsigned Opcode) {
assert(Opcode == Instruction::ExtractElement \|\|		assert(Opcode == Instruction::ExtractElement \|\|
Opcode == Instruction::ExtractValue);		Opcode == Instruction::ExtractValue);
if (Opcode == Instruction::ExtractElement) {		if (Opcode == Instruction::ExtractElement) {
ConstantInt *CI = dyn_cast<ConstantInt>(E->getOperand(1));		ConstantInt *CI = dyn_cast<ConstantInt>(E->getOperand(1));
return CI && CI->getZExtValue() == Idx;		return CI && CI->getZExtValue() == Idx;
} else {		} else {
		filcabUnsubmitted Not Done Reply Inline Actions All the changes in this function and its callers can be extracted from this diff, AFAICT. filcab: All the changes in this function and its callers can be extracted from this diff, AFAICT.
ExtractValueInst *EI = cast<ExtractValueInst>(E);		ExtractValueInst *EI = cast<ExtractValueInst>(E);
return EI->getNumIndices() == 1 && *EI->idx_begin() == Idx;		return EI->getNumIndices() == 1 && *EI->idx_begin() == Idx;
}		}
}		}

/// \returns True if in-tree use also needs extract. This refers to		/// \returns True if in-tree use also needs extract. This refers to
/// possible scalar operand in vectorized instruction.		/// possible scalar operand in vectorized instruction.
static bool InTreeUserNeedToExtract(Value Scalar, Instruction UserInst,		static bool InTreeUserNeedToExtract(Value Scalar, Instruction UserInst,
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	public:
void buildTree(ArrayRef<Value *> Roots,		void buildTree(ArrayRef<Value *> Roots,
ExtraValueToDebugLocsMap &ExternallyUsedValues,		ExtraValueToDebugLocsMap &ExternallyUsedValues,
ArrayRef<Value *> UserIgnoreLst = None);		ArrayRef<Value *> UserIgnoreLst = None);

/// Clear the internal data structures that are created by 'buildTree'.		/// Clear the internal data structures that are created by 'buildTree'.
void deleteTree() {		void deleteTree() {
VectorizableTree.clear();		VectorizableTree.clear();
ScalarToTreeEntry.clear();		ScalarToTreeEntry.clear();
		ExtraScalarToTreeEntry.clear();
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions we don't need to do this. dtemirbulatov: we don't need to do this.
MustGather.clear();		MustGather.clear();
ExternalUses.clear();		ExternalUses.clear();
NumLoadsWantToKeepOrder = 0;		NumLoadsWantToKeepOrder = 0;
NumLoadsWantToChangeOrder = 0;		NumLoadsWantToChangeOrder = 0;
for (auto &Iter : BlocksSchedules) {		for (auto &Iter : BlocksSchedules) {
BlockScheduling *BS = Iter.second.get();		BlockScheduling *BS = Iter.second.get();
BS->clear();		BS->clear();
}		}
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	private:

/// \returns the scalarization cost for this list of values. Assuming that		/// \returns the scalarization cost for this list of values. Assuming that
/// this subtree gets vectorized, we may need to extract the values from the		/// this subtree gets vectorized, we may need to extract the values from the
/// roots. This method calculates the cost of extracting the values.		/// roots. This method calculates the cost of extracting the values.
int getGatherCost(ArrayRef<Value *> VL);		int getGatherCost(ArrayRef<Value *> VL);

/// \brief Set the Builder insert point to one after the last instruction in		/// \brief Set the Builder insert point to one after the last instruction in
/// the bundle		/// the bundle
void setInsertPointAfterBundle(ArrayRef<Value *> VL);		void setInsertPointAfterBundle(ArrayRef<Value > VL, Value OpValue);

/// \returns a vector from a collection of scalars in \p VL.		/// \returns a vector from a collection of scalars in \p VL.
Value Gather(ArrayRef<Value > VL, VectorType *Ty);		Value Gather(ArrayRef<Value > VL, VectorType *Ty);

/// \returns whether the VectorizableTree is fully vectorizable and will		/// \returns whether the VectorizableTree is fully vectorizable and will
/// be beneficial even the tree height is tiny.		/// be beneficial even the tree height is tiny.
bool isFullyVectorizableTinyTree();		bool isFullyVectorizableTinyTree();

/// \reorder commutative operands in alt shuffle if they result in		/// \reorder commutative operands in alt shuffle if they result in
/// vectorized code.		/// vectorized code.
void reorderAltShuffleOperands(ArrayRef<Value *> VL,		void reorderAltShuffleOperands(unsigned Opcode, ArrayRef<Value *> VL,
SmallVectorImpl<Value *> &Left,		SmallVectorImpl<Value *> &Left,
SmallVectorImpl<Value *> &Right);		SmallVectorImpl<Value *> &Right);
/// \reorder commutative operands to get better probability of		/// \reorder commutative operands to get better probability of
/// generating vectorized code.		/// generating vectorized code.
void reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,		void reorderInputsAccordingToOpcode(unsigned Opcode, ArrayRef<Value *> VL,
SmallVectorImpl<Value *> &Left,		SmallVectorImpl<Value *> &Left,
		ABataevUnsubmitted Not Done Reply Inline Actions Left and Right must be `SmallVectorImpl<InstructionOrPseudo > &`? ABataev:* Left and Right must be `SmallVectorImpl<InstructionOrPseudo *> &`?
		dtemirbulatovAuthorUnsubmitted Done Reply Inline Actions That change further increases the size of this review, we can change that later. dtemirbulatov: That change further increases the size of this review, we can change that later.
SmallVectorImpl<Value *> &Right);		SmallVectorImpl<Value *> &Right);
struct TreeEntry {		struct TreeEntry {
TreeEntry(std::vector<TreeEntry> &Container)		TreeEntry(std::vector<TreeEntry> &Container)
: Scalars(), VectorizedValue(nullptr), NeedToGather(0),		: Scalars(), VectorizedValue(nullptr), NeedToGather(0),
Container(Container) {}		Container(Container) {}

/// \returns true if the scalars in VL are equal to this entry.		/// \returns true if the scalars in VL are equal to this entry.
bool isSame(ArrayRef<Value *> VL) const {		bool isSame(ArrayRef<Value *> VL) const {
Show All 16 Lines	struct TreeEntry {
/// to be a pointer and needs to be able to initialize the child iterator.		/// to be a pointer and needs to be able to initialize the child iterator.
/// Thus we need a reference back to the container to translate the indices		/// Thus we need a reference back to the container to translate the indices
/// to entries.		/// to entries.
std::vector<TreeEntry> &Container;		std::vector<TreeEntry> &Container;

/// The TreeEntry index containing the user of this entry. We can actually		/// The TreeEntry index containing the user of this entry. We can actually
/// have multiple users so the data structure is not truly a tree.		/// have multiple users so the data structure is not truly a tree.
SmallVector<int, 1> UserTreeIndices;		SmallVector<int, 1> UserTreeIndices;

		ABataevUnsubmitted Done Reply Inline Actions Restore the original ABataev: Restore the original
		/// Info about instruction in this tree entry.
		InstructionsState State;
};		};

/// Create a new VectorizableTree entry.		/// Create a new VectorizableTree entry.
TreeEntry newTreeEntry(ArrayRef<Value > VL, bool Vectorized,		TreeEntry newTreeEntry(ArrayRef<Value > VL, bool Vectorized,
		ABataevUnsubmitted Not Done Reply Inline Actions Instead of `ArrayRef<Value >` I expected to see something like `ArrayRef<InstructionOrPseudoInstruction>`, where `InstructionOrPseudoInstruction` is the class that represents the instruction/Value itself, or the pseudo instruction. ABataev:* Instead of `ArrayRef<Value *>` I expected to see something like…
int &UserTreeIdx) {		int &UserTreeIdx, const InstructionsState &S) {
		assert((!Vectorized \|\| S.Opcode != 0) &&
		"Vectorized TreeEntry without opcode");
VectorizableTree.emplace_back(VectorizableTree);		VectorizableTree.emplace_back(VectorizableTree);
int idx = VectorizableTree.size() - 1;		int idx = VectorizableTree.size() - 1;
TreeEntry *Last = &VectorizableTree[idx];		TreeEntry *Last = &VectorizableTree[idx];
Last->Scalars.insert(Last->Scalars.begin(), VL.begin(), VL.end());		Last->Scalars.insert(Last->Scalars.begin(), VL.begin(), VL.end());
Last->NeedToGather = !Vectorized;		Last->NeedToGather = !Vectorized;
if (Vectorized) {		if (Vectorized) {
		Last->State = S;
		unsigned AltOpcode = getAltOpcode(S.Opcode);
for (int i = 0, e = VL.size(); i != e; ++i) {		for (int i = 0, e = VL.size(); i != e; ++i) {
assert(!getTreeEntry(VL[i]) && "Scalar already in tree!");		unsigned RealOpcode =
ScalarToTreeEntry[VL[i]] = idx;		(S.IsAltShuffle && isOdd(i)) ? AltOpcode : S.Opcode;
		Value *Key = (cast<Instruction>(VL[i])->getOpcode() == RealOpcode)
		? VL[i]
		: S.OpValue;
		assert(!getTreeEntry(VL[i], Key) && "Scalar already in tree!");
		if (VL[i] == Key)
		ScalarToTreeEntry[Key] = idx;
		else
		ExtraScalarToTreeEntry[VL[i]][Key] = idx;
}		}
} else {		} else {
		Last->State.Opcode = 0;
		Last->State.OpValue = VL[0];
		Last->State.IsAltShuffle = false;
MustGather.insert(VL.begin(), VL.end());		MustGather.insert(VL.begin(), VL.end());
}		}

		ABataevUnsubmitted Not Done Reply Inline Actions Why do you need this? ABataev: Why do you need this?
if (UserTreeIdx >= 0)		if (UserTreeIdx >= 0)
Last->UserTreeIndices.push_back(UserTreeIdx);		Last->UserTreeIndices.push_back(UserTreeIdx);
UserTreeIdx = idx;		UserTreeIdx = idx;
return Last;		return Last;
}		}

/// -- Vectorization State --		/// -- Vectorization State --
/// Holds all of the tree entries.		/// Holds all of the tree entries.
std::vector<TreeEntry> VectorizableTree;		std::vector<TreeEntry> VectorizableTree;

TreeEntry getTreeEntry(Value V) {		TreeEntry getTreeEntry(Value V) {
auto I = ScalarToTreeEntry.find(V);		auto I = ScalarToTreeEntry.find(V);
if (I != ScalarToTreeEntry.end())		if (I != ScalarToTreeEntry.end())
return &VectorizableTree[I->second];		return &VectorizableTree[I->second];
		ABataevUnsubmitted Not Done Reply Inline Actions Again, you need all this code because you did not implemented what we discussed. Try to use the `InstructionOrPseudoInstruction` like class to represent values/instructions and pseudo-instructions. It should be much easier to implement and a lot of changes will just go away. ABataev: Again, you need all this code because you did not implemented what we discussed. Try to use the…
		ABataevUnsubmitted Not Done Reply Inline Actions Also, seems to me this must be an `InstructionOrPseudoOp` ABataev: Also, seems to me this must be an `InstructionOrPseudoOp`
		dtemirbulatovAuthorUnsubmitted Done Reply Inline Actions no, looks like "Instruction" here is more convenient. dtemirbulatov: no, looks like "Instruction" here is more convenient.
return nullptr;		return nullptr;
}		}

const TreeEntry getTreeEntry(Value V) const {		const TreeEntry getTreeEntry(Value V) const {
auto I = ScalarToTreeEntry.find(V);		auto I = ScalarToTreeEntry.find(V);
if (I != ScalarToTreeEntry.end())		if (I != ScalarToTreeEntry.end())
return &VectorizableTree[I->second];		return &VectorizableTree[I->second];
return nullptr;		return nullptr;
}		}

		TreeEntry getTreeEntry(Value V, Value *OpValue) {
		if (V == OpValue)
		return getTreeEntry(V);
		auto I = ExtraScalarToTreeEntry.find(V);
		if (I != ExtraScalarToTreeEntry.end()) {
		auto &STT = I->second;
		auto STTI = STT.find(OpValue);
		if (STTI != STT.end())
		return &VectorizableTree[STTI->second];
		}
		return nullptr;
		}

/// Maps a specific scalar to its tree entry.		/// Maps a specific scalar to its tree entry.
SmallDenseMap<Value*, int> ScalarToTreeEntry;		SmallDenseMap<Value*, int> ScalarToTreeEntry;
		RKSimonUnsubmitted Not Done Reply Inline Actions clang-format NFC pre-commit RKSimon: clang-format NFC pre-commit

		/// Maps a specific scalar to its tree entry(s) with leading scalar.
		SmallDenseMap<Value, SmallDenseMap<Value, int>> ExtraScalarToTreeEntry;
		ABataevUnsubmitted Not Done Reply Inline Actions Not formatted ABataev: Not formatted

/// A list of scalars that we found that we need to keep as scalars.		/// A list of scalars that we found that we need to keep as scalars.
ValueSet MustGather;		ValueSet MustGather;

/// This POD struct describes one external user in the vectorized tree.		/// This POD struct describes one external user in the vectorized tree.
struct ExternalUser {		struct ExternalUser {
ExternalUser (Value S, llvm::User U, int L) :		ExternalUser (Value S, llvm::User U, int L) :
Scalar(S), User(U), Lane(L){}		Scalar(S), User(U), Lane(L){}
// Which scalar in our function.		// Which scalar in our function.
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	struct ScheduleData {
// The initial value for the dependency counters. It means that the		// The initial value for the dependency counters. It means that the
// dependencies are not calculated yet.		// dependencies are not calculated yet.
enum { InvalidDeps = -1 };		enum { InvalidDeps = -1 };

ScheduleData()		ScheduleData()
: Inst(nullptr), FirstInBundle(nullptr), NextInBundle(nullptr),		: Inst(nullptr), FirstInBundle(nullptr), NextInBundle(nullptr),
NextLoadStore(nullptr), SchedulingRegionID(0), SchedulingPriority(0),		NextLoadStore(nullptr), SchedulingRegionID(0), SchedulingPriority(0),
Dependencies(InvalidDeps), UnscheduledDeps(InvalidDeps),		Dependencies(InvalidDeps), UnscheduledDeps(InvalidDeps),
UnscheduledDepsInBundle(InvalidDeps), IsScheduled(false) {}		UnscheduledDepsInBundle(InvalidDeps), IsScheduled(false),
		OpValue(nullptr) {}

void init(int BlockSchedulingRegionID) {		void init(int BlockSchedulingRegionID, Value *OpVal) {
FirstInBundle = this;		FirstInBundle = this;
NextInBundle = nullptr;		NextInBundle = nullptr;
NextLoadStore = nullptr;		NextLoadStore = nullptr;
IsScheduled = false;		IsScheduled = false;
SchedulingRegionID = BlockSchedulingRegionID;		SchedulingRegionID = BlockSchedulingRegionID;
UnscheduledDepsInBundle = UnscheduledDeps;		UnscheduledDepsInBundle = UnscheduledDeps;
clearDependencies();		clearDependencies();
		OpValue = OpVal;
		RKSimonUnsubmitted Not Done Reply Inline Actions Can you just change the name of either the argument or member to avoid the this-> ? RKSimon: Can you just change the name of either the argument or member to avoid the this-> ?
}		}
		ABataevUnsubmitted Done Reply Inline Actions Why do you need this new function? No comments and explanation. ABataev: Why do you need this new function? No comments and explanation.

/// Returns true if the dependency information has been calculated.		/// Returns true if the dependency information has been calculated.
bool hasValidDependencies() const { return Dependencies != InvalidDeps; }		bool hasValidDependencies() const { return Dependencies != InvalidDeps; }

/// Returns true for single instructions and for bundle representatives		/// Returns true for single instructions and for bundle representatives
/// (= the head of a bundle).		/// (= the head of a bundle).
bool isSchedulingEntity() const { return FirstInBundle == this; }		bool isSchedulingEntity() const { return FirstInBundle == this; }

Show All 29 Lines	void clearDependencies() {
Dependencies = InvalidDeps;		Dependencies = InvalidDeps;
resetUnscheduledDeps();		resetUnscheduledDeps();
MemoryDependencies.clear();		MemoryDependencies.clear();
}		}

void dump(raw_ostream &os) const {		void dump(raw_ostream &os) const {
if (!isSchedulingEntity()) {		if (!isSchedulingEntity()) {
os << "/ " << *Inst;		os << "/ " << *Inst;
} else if (NextInBundle) {		} else if (NextInBundle) {
		ABataevUnsubmitted Done Reply Inline Actions Pseudo instruction must be just ignored? ABataev: Pseudo instruction must be just ignored?
os << '[' << *Inst;		os << '[' << *Inst;
ScheduleData *SD = NextInBundle;		ScheduleData *SD = NextInBundle;
while (SD) {		while (SD) {
os << ';' << *SD->Inst;		os << ';' << *SD->Inst;
SD = SD->NextInBundle;		SD = SD->NextInBundle;
}		}
os << ']';		os << ']';
} else {		} else {
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	struct ScheduleData {

/// The sum of UnscheduledDeps in a bundle. Equals to UnscheduledDeps for		/// The sum of UnscheduledDeps in a bundle. Equals to UnscheduledDeps for
/// single instructions.		/// single instructions.
int UnscheduledDepsInBundle;		int UnscheduledDepsInBundle;

/// True if this instruction is scheduled (or considered as scheduled in the		/// True if this instruction is scheduled (or considered as scheduled in the
/// dry-run).		/// dry-run).
bool IsScheduled;		bool IsScheduled;

		/// Opcode of the current instruction in the schedule data.
		Value *OpValue;
};		};

#ifndef NDEBUG		#ifndef NDEBUG
friend inline raw_ostream &operator<<(raw_ostream &os,		friend inline raw_ostream &operator<<(raw_ostream &os,
const BoUpSLP::ScheduleData &SD) {		const BoUpSLP::ScheduleData &SD) {
SD.dump(os);		SD.dump(os);
return os;		return os;
}		}
Show All 12 Lines	BlockScheduling(BasicBlock *BB)
ScheduleRegionSize(0),		ScheduleRegionSize(0),
ScheduleRegionSizeLimit(ScheduleRegionSizeBudget),		ScheduleRegionSizeLimit(ScheduleRegionSizeBudget),
// Make sure that the initial SchedulingRegionID is greater than the		// Make sure that the initial SchedulingRegionID is greater than the
// initial SchedulingRegionID in ScheduleData (which is 0).		// initial SchedulingRegionID in ScheduleData (which is 0).
SchedulingRegionID(1) {}		SchedulingRegionID(1) {}

void clear() {		void clear() {
ReadyInsts.clear();		ReadyInsts.clear();
ScheduleStart = nullptr;		ScheduleStart = nullptr;
		ABataevUnsubmitted Not Done Reply Inline Actions Very strange that you still need particular scheduling class for the pseudo instructions, I think you can use the original data structure if you correctly implement the pseudo instruction itself. I still don't see all the changes we talked about. ABataev: Very strange that you still need particular scheduling class for the pseudo instructions, I…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions ok, I am implementing now. dtemirbulatov: ok, I am implementing now.
ScheduleEnd = nullptr;		ScheduleEnd = nullptr;
FirstLoadStoreInRegion = nullptr;		FirstLoadStoreInRegion = nullptr;
LastLoadStoreInRegion = nullptr;		LastLoadStoreInRegion = nullptr;

// Reduce the maximum schedule region size by the size of the		// Reduce the maximum schedule region size by the size of the
// previous scheduling run.		// previous scheduling run.
ScheduleRegionSizeLimit -= ScheduleRegionSize;		ScheduleRegionSizeLimit -= ScheduleRegionSize;
if (ScheduleRegionSizeLimit < MinScheduleRegionSize)		if (ScheduleRegionSizeLimit < MinScheduleRegionSize)
ScheduleRegionSizeLimit = MinScheduleRegionSize;		ScheduleRegionSizeLimit = MinScheduleRegionSize;
ScheduleRegionSize = 0;		ScheduleRegionSize = 0;

// Make a new scheduling region, i.e. all existing ScheduleData is not		// Make a new scheduling region, i.e. all existing ScheduleData is not
// in the new region yet.		// in the new region yet.
++SchedulingRegionID;		++SchedulingRegionID;
}		}

ScheduleData getScheduleData(Value V) {		ScheduleData getScheduleData(Value V) {
ScheduleData *SD = ScheduleDataMap[V];		ScheduleData *SD = ScheduleDataMap[V];
		ABataevUnsubmitted Done Reply Inline Actions Again, I think it must be `InstructionOrPseudoOp` ABataev: Again, I think it must be `InstructionOrPseudoOp`
if (SD && SD->SchedulingRegionID == SchedulingRegionID)		if (SD && SD->SchedulingRegionID == SchedulingRegionID)
return SD;		return SD;
return nullptr;		return nullptr;
}		}

		ScheduleData getScheduleData(Value V, Value *Key) {
		if (V == Key)
		return getScheduleData(V);
		auto I = ExtraScheduleDataMap.find(V);
		if (I != ExtraScheduleDataMap.end()) {
		ScheduleData *SD = I->second[Key];
		if (SD && SD->SchedulingRegionID == SchedulingRegionID)
		return SD;
		}
		return nullptr;
		}

bool isInSchedulingRegion(ScheduleData *SD) {		bool isInSchedulingRegion(ScheduleData *SD) {
return SD->SchedulingRegionID == SchedulingRegionID;		return SD->SchedulingRegionID == SchedulingRegionID;
}		}

/// Marks an instruction as scheduled and puts all dependent ready		/// Marks an instruction as scheduled and puts all dependent ready
/// instructions into the ready-list.		/// instructions into the ready-list.
template <typename ReadyListType>		template <typename ReadyListType>
void schedule(ScheduleData *SD, ReadyListType &ReadyList) {		void schedule(ScheduleData *SD, ReadyListType &ReadyList) {
SD->IsScheduled = true;		SD->IsScheduled = true;
DEBUG(dbgs() << "SLP: schedule " << *SD << "\n");		DEBUG(dbgs() << "SLP: schedule " << *SD << "\n");

ScheduleData *BundleMember = SD;		ScheduleData *BundleMember = SD;
while (BundleMember) {		while (BundleMember) {
		if (BundleMember->Inst == BundleMember->OpValue) {
// Handle the def-use chain dependencies.		// Handle the def-use chain dependencies.
for (Use &U : BundleMember->Inst->operands()) {		for (Use &U : BundleMember->Inst->operands()) {
ScheduleData *OpDef = getScheduleData(U.get());		auto *I = dyn_cast<Instruction>(U.get());
		if (!I)
		continue;
		doForAllOpcodes(I, [&ReadyList](ScheduleData *OpDef) {
if (OpDef && OpDef->hasValidDependencies() &&		if (OpDef && OpDef->hasValidDependencies() &&
OpDef->incrementUnscheduledDeps(-1) == 0) {		OpDef->incrementUnscheduledDeps(-1) == 0) {
// There are no more unscheduled dependencies after decrementing,		// There are no more unscheduled dependencies after
		// decrementing,
// so we can put the dependent instruction into the ready list.		// so we can put the dependent instruction into the ready list.
ScheduleData *DepBundle = OpDef->FirstInBundle;		ScheduleData *DepBundle = OpDef->FirstInBundle;
assert(!DepBundle->IsScheduled &&		assert(!DepBundle->IsScheduled &&
"already scheduled bundle gets ready");		"already scheduled bundle gets ready");
ReadyList.insert(DepBundle);		ReadyList.insert(DepBundle);
DEBUG(dbgs() << "SLP: gets ready (def): " << *DepBundle << "\n");		DEBUG(dbgs()
		<< "SLP: gets ready (def): " << *DepBundle << "\n");
}		}
		});
}		}
// Handle the memory dependencies.		// Handle the memory dependencies.
for (ScheduleData *MemoryDepSD : BundleMember->MemoryDependencies) {		for (ScheduleData *MemoryDepSD : BundleMember->MemoryDependencies) {
if (MemoryDepSD->incrementUnscheduledDeps(-1) == 0) {		if (MemoryDepSD->incrementUnscheduledDeps(-1) == 0) {
// There are no more unscheduled dependencies after decrementing,		// There are no more unscheduled dependencies after decrementing,
// so we can put the dependent instruction into the ready list.		// so we can put the dependent instruction into the ready list.
ScheduleData *DepBundle = MemoryDepSD->FirstInBundle;		ScheduleData *DepBundle = MemoryDepSD->FirstInBundle;
assert(!DepBundle->IsScheduled &&		assert(!DepBundle->IsScheduled &&
"already scheduled bundle gets ready");		"already scheduled bundle gets ready");
ReadyList.insert(DepBundle);		ReadyList.insert(DepBundle);
DEBUG(dbgs() << "SLP: gets ready (mem): " << *DepBundle << "\n");		DEBUG(dbgs() << "SLP: gets ready (mem): " << *DepBundle
		<< "\n");
		}
}		}
}		}
BundleMember = BundleMember->NextInBundle;		BundleMember = BundleMember->NextInBundle;
}		}
}		}

		void doForAllOpcodes(Value *V,
		function_ref<void(ScheduleData *SD)> Action) {
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions Here is the fix for the first issue related to the last commit, we have to check SD before usage because it might not be longer valid at the time. dtemirbulatov: Here is the fix for the first issue related to the last commit, we have to check SD before…
		if (ScheduleData *SD = getScheduleData(V))
		Action(SD);
		auto I = ExtraScheduleDataMap.find(V);
		if (I != ExtraScheduleDataMap.end())
		for (auto &P : I->second)
		if (P.second->SchedulingRegionID == SchedulingRegionID)
		Action(P.second);
		}

/// Put all instructions into the ReadyList which are ready for scheduling.		/// Put all instructions into the ReadyList which are ready for scheduling.
		ABataevUnsubmitted Not Done Reply Inline Actions Again, `InstructionOrPseudoOp` ABataev: Again, `InstructionOrPseudoOp`
template <typename ReadyListType>		template <typename ReadyListType>
void initialFillReadyList(ReadyListType &ReadyList) {		void initialFillReadyList(ReadyListType &ReadyList) {
for (auto *I = ScheduleStart; I != ScheduleEnd; I = I->getNextNode()) {		for (auto *I = ScheduleStart; I != ScheduleEnd; I = I->getNextNode()) {
ScheduleData *SD = getScheduleData(I);		doForAllOpcodes(I, [&ReadyList, I](ScheduleData *SD) {
if (SD->isSchedulingEntity() && SD->isReady()) {		if (SD->isSchedulingEntity() && SD->isReady()) {
ReadyList.insert(SD);		ReadyList.insert(SD);
DEBUG(dbgs() << "SLP: initially in ready list: " << *I << "\n");		DEBUG(dbgs() << "SLP: initially in ready list: " << *I << "\n");
}		}
		});
}		}
}		}

/// Checks if a bundle of instructions can be scheduled, i.e. has no		/// Checks if a bundle of instructions can be scheduled, i.e. has no
/// cyclic dependencies. This is only a dry-run, no instructions are		/// cyclic dependencies. This is only a dry-run, no instructions are
/// actually moved at this stage.		/// actually moved at this stage.
bool tryScheduleBundle(ArrayRef<Value > VL, BoUpSLP SLP, Value *OpValue);		bool tryScheduleBundle(ArrayRef<Value > VL, BoUpSLP SLP, Value *OpValue);

/// Un-bundles a group of instructions.		/// Un-bundles a group of instructions.
void cancelScheduling(ArrayRef<Value > VL, Value OpValue);		void cancelScheduling(ArrayRef<Value > VL, Value OpValue);

		/// Allocates schedule data chunk.
		ScheduleData *allocateScheduleDataChunks();

/// Extends the scheduling region so that V is inside the region.		/// Extends the scheduling region so that V is inside the region.
/// \returns true if the region size is within the limit.		/// \returns true if the region size is within the limit.
bool extendSchedulingRegion(Value *V);		bool extendSchedulingRegion(Value V, Value OpValue);

/// Initialize the ScheduleData structures for new instructions in the		/// Initialize the ScheduleData structures for new instructions in the
/// scheduling region.		/// scheduling region.
void initScheduleData(Instruction FromI, Instruction ToI,		void initScheduleData(Instruction FromI, Instruction ToI,
ScheduleData *PrevLoadStore,		ScheduleData *PrevLoadStore,
ScheduleData *NextLoadStore);		ScheduleData *NextLoadStore);

/// Updates the dependency information of a bundle and of all instructions/		/// Updates the dependency information of a bundle and of all instructions/
/// bundles which depend on the original bundle.		/// bundles which depend on the original bundle.
void calculateDependencies(ScheduleData *SD, bool InsertInReadyList,		void calculateDependencies(ScheduleData *SD, bool InsertInReadyList,
BoUpSLP *SLP);		BoUpSLP *SLP);

/// Sets all instruction in the scheduling region to un-scheduled.		/// Sets all instruction in the scheduling region to un-scheduled.
void resetSchedule();		void resetSchedule();

BasicBlock *BB;		BasicBlock *BB;
		ABataevUnsubmitted Done Reply Inline Actions If you implement `InstructionOrPseudoOp` correctly, this reorder stuff should not be required, I think ABataev: If you implement `InstructionOrPseudoOp` correctly, this reorder stuff should not be required…
		dtemirbulatovAuthorUnsubmitted Done Reply Inline Actions well, There should be several(>=2) independent scheduling events(one for real instruction and other for pseudos) and there is just one real instruction, in the end, I don't see how it could be done without reordering or tracking the last scheduled instance for the same instruction. We could introduce something like IsLastScheduled field in ScheduleData struct, but it would be quite similar to reordering. dtemirbulatov: well, There should be several(>=2) independent scheduling events(one for real instruction and…
		ABataevUnsubmitted Not Done Reply Inline Actions If you add the real instruction instead of this pseudoinstruction, will you need all these scheduling events? No. Will you need to do some extra reordering etc.? No. Why you cannot simulate it with the new class/structure? ABataev: If you add the real instruction instead of this pseudoinstruction, will you need all these…
		dtemirbulatovAuthorUnsubmitted Done Reply Inline Actions do you mean that the last one scheduling becomes the real one and we just ignore any pseudos? dtemirbulatov: do you mean that the last one scheduling becomes the real one and we just ignore any pseudos?
		dtemirbulatovAuthorUnsubmitted Done Reply Inline Actions do you mean that the last one scheduled becomes the real one and we just ignore any pseudos? dtemirbulatov: do you mean that the last one scheduled becomes the real one and we just ignore any pseudos?
		ABataevUnsubmitted Not Done Reply Inline Actions No. I mean, that if you insert the real instructions instead of those pseudo-instructions, you won't need all that reordering/new scheduling etc. Why can't you mimic this behavior with the pseudo-instruction? ABataev: No. I mean, that if you insert the real instructions instead of those pseudo-instructions, you…
		dtemirbulatovAuthorUnsubmitted Done Reply Inline Actions hmm, There are at least NextLoadStore dependancies that we break if we, for example, insert real Load instruction somewhere. Or with could recalculate NextLoadStore. Or maybe mimic pseudo in some another way. dtemirbulatov: hmm, There are at least NextLoadStore dependancies that we break if we, for example, insert…
		ABataevUnsubmitted Not Done Reply Inline Actions I don't think that they can be broken as we're not going to insert new Loads/Stores, just some binops. SO, the loads/stores and the the corresponding dependencies should not be affected. ABataev: I don't think that they can be broken as we're not going to insert new Loads/Stores, just some…

/// Simple memory allocation for ScheduleData.		/// Simple memory allocation for ScheduleData.
std::vector<std::unique_ptr<ScheduleData[]>> ScheduleDataChunks;		std::vector<std::unique_ptr<ScheduleData[]>> ScheduleDataChunks;

/// The size of a ScheduleData array in ScheduleDataChunks.		/// The size of a ScheduleData array in ScheduleDataChunks.
int ChunkSize;		int ChunkSize;

/// The allocator position in the current chunk, which is the last entry		/// The allocator position in the current chunk, which is the last entry
/// of ScheduleDataChunks.		/// of ScheduleDataChunks.
int ChunkPos;		int ChunkPos;

/// Attaches ScheduleData to Instruction.		/// Attaches ScheduleData to Instruction.
/// Note that the mapping survives during all vectorization iterations, i.e.		/// Note that the mapping survives during all vectorization iterations, i.e.
/// ScheduleData structures are recycled.		/// ScheduleData structures are recycled.
DenseMap<Value , ScheduleData > ScheduleDataMap;		DenseMap<Value , ScheduleData > ScheduleDataMap;
		ABataevUnsubmitted Done Reply Inline Actions Why you cannot store everything in a single map? ABataev: Why you cannot store everything in a single map?

		/// Attaches ScheduleData to Instruction with the leading key.
		DenseMap<Value , SmallDenseMap<Value , ScheduleData *>>
		ExtraScheduleDataMap;

struct ReadyList : SmallVector<ScheduleData *, 8> {		struct ReadyList : SmallVector<ScheduleData *, 8> {
void insert(ScheduleData *SD) { push_back(SD); }		void insert(ScheduleData *SD) { push_back(SD); }
};		};

/// The ready-list for scheduling (only used for the dry-run).		/// The ready-list for scheduling (only used for the dry-run).
ReadyList ReadyInsts;		ReadyList ReadyInsts;

/// The first instruction of the scheduling region.		/// The first instruction of the scheduling region.
▲ Show 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
for (TreeEntry &EIdx : VectorizableTree) {		for (TreeEntry &EIdx : VectorizableTree) {
TreeEntry *Entry = &EIdx;		TreeEntry *Entry = &EIdx;

// No need to handle users of gathered values.		// No need to handle users of gathered values.
if (Entry->NeedToGather)		if (Entry->NeedToGather)
continue;		continue;

// For each lane:		// For each lane:
		const unsigned Opcode = Entry->State.Opcode;
		const unsigned AltOpcode = getAltOpcode(Opcode);
for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {		for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
		RKSimonUnsubmitted Not Done Reply Inline Actions for (auto Scalar : Entry->Scalars) As a NFC pre-commit if possible RKSimon:* for (auto *Scalar : Entry->Scalars) As a NFC pre-commit if possible
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions hmm, we actually need "Lane" variable in logic below. dtemirbulatov: hmm, we actually need "Lane" variable in logic below.
Value *Scalar = Entry->Scalars[Lane];		Value *Scalar = Entry->Scalars[Lane];

		if (!sameOpcodeOrAlt(Opcode, AltOpcode,
		cast<Instruction>(Scalar)->getOpcode()))
		continue;

// Check if the scalar is externally used as an extra arg.		// Check if the scalar is externally used as an extra arg.
auto ExtI = ExternallyUsedValues.find(Scalar);		auto ExtI = ExternallyUsedValues.find(Scalar);
if (ExtI != ExternallyUsedValues.end()) {		if (ExtI != ExternallyUsedValues.end()) {
DEBUG(dbgs() << "SLP: Need to extract: Extra arg from lane " <<		DEBUG(dbgs() << "SLP: Need to extract: Extra arg from lane " <<
Lane << " from " << *Scalar << ".\n");		Lane << " from " << *Scalar << ".\n");
ExternalUses.emplace_back(Scalar, nullptr, Lane);		ExternalUses.emplace_back(Scalar, nullptr, Lane);
continue;		continue;
}		}
Show All 16 Lines	for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
<< ".\n");		<< ".\n");
assert(!UseEntry->NeedToGather && "Bad state");		assert(!UseEntry->NeedToGather && "Bad state");
continue;		continue;
}		}
}		}

// Ignore users in the user ignore list.		// Ignore users in the user ignore list.
if (is_contained(UserIgnoreList, UserInst))		if (is_contained(UserIgnoreList, UserInst))
continue;		continue;
		ABataevUnsubmitted Not Done Reply Inline Actions Remove extra parens around `Entry->State.IsNonAlt` Why are you skipping bundles with non alternarive opcodes only? ABataev: 1. Remove extra parens around `Entry->State.IsNonAlt` 2. Why are you skipping bundles with non…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions yes, we don't need to check Entry->State.IsNonAlt here, that is similar to getTreeEntry(U, Scalar) == Entry. dtemirbulatov: yes, we don't need to check Entry->State.IsNonAlt here, that is similar to getTreeEntry(U…

DEBUG(dbgs() << "SLP: Need to extract:" << *U << " from lane " <<		DEBUG(dbgs() << "SLP: Need to extract:" << *U << " from lane " <<
Lane << " from " << *Scalar << ".\n");		Lane << " from " << *Scalar << ".\n");
ExternalUses.push_back(ExternalUser(Scalar, U, Lane));		ExternalUses.push_back(ExternalUser(Scalar, U, Lane));
}		}
}		}
}		}
}		}

		static Value getDefaultConstantForOpcode(unsigned Opcode, Type Ty) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Can we use ConstantExpr::getBinOpIdentity instead? RKSimon: Can we use ConstantExpr::getBinOpIdentity instead?
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions no, ConstantExpr::getBinOpIdentity does support only commutative operations. dtemirbulatov: no, ConstantExpr::getBinOpIdentity does support only commutative operations.
		lebedev.riUnsubmitted Not Done Reply Inline Actions Are you sure? Constant ConstantExpr::getBinOpIdentity(unsigned Opcode, Type Ty, bool AllowRHSConstant) { ... // Non-commutative opcodes: AllowRHSConstant must be set. if (!AllowRHSConstant) return nullptr; lebedev.ri: Are you sure? ``` Constant ConstantExpr::getBinOpIdentity(unsigned Opcode, Type Ty…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions oh, yes, sorry, it just a different behaviour here. we want for example 1 division and ConstantExpr::getBinOpIdentity would return 0. dtemirbulatov: oh, yes, sorry, it just a different behaviour here. we want for example 1 division and…
		switch(Opcode) {
		case Instruction::Add:
		case Instruction::Sub:
		case Instruction::Or:
		case Instruction::Xor:
		return ConstantInt::getNullValue(Ty);
		case Instruction::Mul:
		case Instruction::UDiv:
		case Instruction::SDiv:
		case Instruction::URem:
		case Instruction::SRem:
		return ConstantInt::get(Ty, /V=/1);
		case Instruction::FAdd:
		case Instruction::FSub:
		return ConstantFP::get(Ty, /V=/0.0);
		case Instruction::FMul:
		case Instruction::FDiv:
		case Instruction::FRem:
		return ConstantFP::get(Ty, /V=/1.0);
		case Instruction::And:
		return ConstantInt::getAllOnesValue(Ty);
		case Instruction::Shl:
		case Instruction::LShr:
		case Instruction::AShr:
		return ConstantInt::getNullValue(Type::getInt32Ty(Ty->getContext()));
		default:
		break;
		}
		llvm_unreachable("unknown binop for default constant value");
		}

void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,		void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,
int UserTreeIdx) {		int UserTreeIdx) {
bool isAltShuffle = false;
assert((allConstant(VL) \|\| allSameType(VL)) && "Invalid types!");		assert((allConstant(VL) \|\| allSameType(VL)) && "Invalid types!");

		InstructionsState S = getSameOpcode(VL);
		ABataevUnsubmitted Not Done Reply Inline Actions Remove it ABataev: Remove it
if (Depth == RecursionMaxDepth) {		if (Depth == RecursionMaxDepth) {
DEBUG(dbgs() << "SLP: Gathering due to max recursion depth.\n");		DEBUG(dbgs() << "SLP: Gathering due to max recursion depth.\n");
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
return;		return;
}		}

// Don't handle vectors.		// Don't handle vectors.
if (VL[0]->getType()->isVectorTy()) {		if (S.OpValue->getType()->isVectorTy()) {
DEBUG(dbgs() << "SLP: Gathering due to vector type.\n");		DEBUG(dbgs() << "SLP: Gathering due to vector type.\n");
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
return;		return;
}		}

if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))		if (StoreInst *SI = dyn_cast<StoreInst>(S.OpValue))
if (SI->getValueOperand()->getType()->isVectorTy()) {		if (SI->getValueOperand()->getType()->isVectorTy()) {
DEBUG(dbgs() << "SLP: Gathering due to store vector type.\n");		DEBUG(dbgs() << "SLP: Gathering due to store vector type.\n");
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
return;		return;
}		}
unsigned Opcode = getSameOpcode(VL);

// Check that this shuffle vector refers to the alternate
// sequence of opcodes.
if (Opcode == Instruction::ShuffleVector) {
Instruction *I0 = dyn_cast<Instruction>(VL[0]);
unsigned Op = I0->getOpcode();
if (Op != Instruction::ShuffleVector)
isAltShuffle = true;
}

// If all of the operands are identical or constant we have a simple solution.		// If all of the operands are identical or constant we have a simple solution.
		RKSimonUnsubmitted Not Done Reply Inline Actions @spatel @dtemirbulatov Can we use getBinOpIdentity yet ? RKSimon: @spatel @dtemirbulatov Can we use getBinOpIdentity yet ?
		spatelUnsubmitted Not Done Reply Inline Actions Yes - if anyone has suggestions for making that 'AllowRHSConstant' param clearer, let me know. The only problem that I see is that this code is returning +0.0 as the default constant for an fadd (because the caller guarantees 'nsz'?). I worked around something like that here: rL346143 I can't tell if SLP would want to do something like that. spatel: Yes - if anyone has suggestions for making that 'AllowRHSConstant' param clearer, let me know.
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions ok, yes, looks like it is going to work. dtemirbulatov: ok, yes, looks like it is going to work.
if (allConstant(VL) \|\| isSplat(VL) \|\| !allSameBlock(VL) \|\| !Opcode) {		if (allConstant(VL) \|\| isSplat(VL) \|\| !allSameBlock(VL) \|\| !S.Opcode) {
DEBUG(dbgs() << "SLP: Gathering due to C,S,B,O. \n");		DEBUG(dbgs() << "SLP: Gathering due to C,S,B,O. \n");
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
return;		return;
}		}

// We now know that this is a vector of instructions of the same type from		// We now know that this is a vector of instructions of the same type from
// the same block.		// the same block.
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions This is the place where we check if a vector is diverse. dtemirbulatov: This is the place where we check if a vector is diverse.
		ABataevUnsubmitted Not Done Reply Inline Actions What are you trying to do here? What's the real problem? ABataev: What are you trying to do here? What's the real problem?
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions there are two issues here: we could not combine remainder operation to any other operation in one vector, because of rem(interger, 1) != integer. sometime we could schedule non-alternative, for example, a load operation, to other for example adds and the load operation is at the bottom of BB and that could result in dominance error for a constructed vector. There are two examples in pr35497.ll dtemirbulatov: there are two issues here: 1) we could not combine remainder operation to any other…
		ABataevUnsubmitted Not Done Reply Inline Actions I think you can combine it, but you cannot choose it as the main opcode. YOu need to change tryToRepresentAsInstArg to exclude reminder operations, I think. Still don't understand. Could you give some more details? ABataev: 1. I think you can combine it, but you cannot choose it as the main opcode. YOu need to change…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions done. done after talking to Alexey offline. dtemirbulatov: 1. done. 2. done after talking to Alexey offline.

// Don't vectorize ephemeral values.		// Don't vectorize ephemeral values.
for (unsigned i = 0, e = VL.size(); i != e; ++i) {		for (unsigned i = 0, e = VL.size(); i != e; ++i) {
if (EphValues.count(VL[i])) {		if (EphValues.count(VL[i])) {
DEBUG(dbgs() << "SLP: The instruction (" << *VL[i] <<		DEBUG(dbgs() << "SLP: The instruction (" << *VL[i] <<
") is ephemeral.\n");		") is ephemeral.\n");
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
return;		return;
		ABataevUnsubmitted Not Done Reply Inline Actions Add the debug message here ABataev: Add the debug message here
}		}
}		}
		ABataevUnsubmitted Not Done Reply Inline Actions Not formatted I think it is better to make it `<=` ABataev: 1. Not formatted 2. I think it is better to make it `<=`

		ABataevUnsubmitted Not Done Reply Inline Actions I think it is enough `else` here. If the opcode is not the same or alternate, it is different. And the final criteria may be simplified like `if (SameOrAlt <= VL.size() / 2) return;` ABataev: I think it is enough `else` here. If the opcode is not the same or alternate, it is different.
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions I missed this line with the last change. dtemirbulatov: I missed this line with the last change.
// Check if this is a duplicate of another entry.		// Check if this is a duplicate of another entry.
if (TreeEntry *E = getTreeEntry(VL[0])) {		if (TreeEntry *E = getTreeEntry(S.OpValue)) {
for (unsigned i = 0, e = VL.size(); i != e; ++i) {		for (unsigned i = 0, e = VL.size(); i != e; ++i) {
DEBUG(dbgs() << "SLP: \tChecking bundle: " << *VL[i] << ".\n");		DEBUG(dbgs() << "SLP: \tChecking bundle: " << *VL[i] << ".\n");
if (E->Scalars[i] != VL[i]) {		if (E->Scalars[i] != VL[i]) {
DEBUG(dbgs() << "SLP: Gathering due to partial overlap.\n");		DEBUG(dbgs() << "SLP: Gathering due to partial overlap.\n");
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
return;		return;
}		}
}		}
// Record the reuse of the tree node. FIXME, currently this is only used to		// Record the reuse of the tree node. FIXME, currently this is only used to
// properly draw the graph rather than for the actual vectorization.		// properly draw the graph rather than for the actual vectorization.
E->UserTreeIndices.push_back(UserTreeIdx);		E->UserTreeIndices.push_back(UserTreeIdx);
DEBUG(dbgs() << "SLP: Perfect diamond merge at " << *VL[0] << ".\n");		DEBUG(dbgs() << "SLP: Perfect diamond merge at " << *S.OpValue << ".\n");
return;		return;
}		}

// Check that none of the instructions in the bundle are already in the tree.		// Check that none of the instructions in the bundle are already in the tree.
		unsigned AltOpcode = getAltOpcode(S.Opcode);
for (unsigned i = 0, e = VL.size(); i != e; ++i) {		for (unsigned i = 0, e = VL.size(); i != e; ++i) {
if (ScalarToTreeEntry.count(VL[i])) {		unsigned RealOpcode = (S.IsAltShuffle && isOdd(i)) ? AltOpcode : S.Opcode;
		auto *I = dyn_cast<Instruction>(VL[i]);
		if (!I)
		continue;
		Value *Key = (I->getOpcode() == RealOpcode) ? I : S.OpValue;
		if (getTreeEntry(I, Key)) {
DEBUG(dbgs() << "SLP: The instruction (" << *VL[i] <<		DEBUG(dbgs() << "SLP: The instruction (" << *VL[i] <<
") is already in tree.\n");		") is already in tree.\n");
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
return;		return;
}		}
}		}

// If any of the scalars is marked as a value that needs to stay scalar then		// If any of the scalars is marked as a value that needs to stay scalar then
// we need to gather the scalars.		// we need to gather the scalars.
for (unsigned i = 0, e = VL.size(); i != e; ++i) {		for (unsigned i = 0, e = VL.size(); i != e; ++i) {
if (MustGather.count(VL[i])) {		if (MustGather.count(VL[i])) {
DEBUG(dbgs() << "SLP: Gathering due to gathered scalar.\n");		DEBUG(dbgs() << "SLP: Gathering due to gathered scalar.\n");
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
return;		return;
}		}
}		}

// Check that all of the users of the scalars that we want to vectorize are		// Check that all of the users of the scalars that we want to vectorize are
// schedulable.		// schedulable.
Instruction *VL0 = cast<Instruction>(VL[0]);		auto *VL0 = cast<Instruction>(S.OpValue);
		RKSimonUnsubmitted Not Done Reply Inline Actions NFC pre-commit RKSimon: NFC pre-commit
BasicBlock *BB = VL0->getParent();		BasicBlock *BB = VL0->getParent();

if (!DT->isReachableFromEntry(BB)) {		if (!DT->isReachableFromEntry(BB)) {
// Don't go into unreachable blocks. They may contain instructions with		// Don't go into unreachable blocks. They may contain instructions with
// dependency cycles which confuse the final scheduling.		// dependency cycles which confuse the final scheduling.
DEBUG(dbgs() << "SLP: bundle in unreachable block.\n");		DEBUG(dbgs() << "SLP: bundle in unreachable block.\n");
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
return;		return;
}		}

// Check that every instructions appears once in this bundle.		// Check that every instructions appears once in this bundle.
for (unsigned i = 0, e = VL.size(); i < e; ++i)		for (unsigned i = 0, e = VL.size(); i < e; ++i)
for (unsigned j = i+1; j < e; ++j)		for (unsigned j = i+1; j < e; ++j)
if (VL[i] == VL[j]) {		if (VL[i] == VL[j]) {
DEBUG(dbgs() << "SLP: Scalar used twice in bundle.\n");		DEBUG(dbgs() << "SLP: Scalar used twice in bundle.\n");
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
return;		return;
}		}

auto &BSRef = BlocksSchedules[BB];		auto &BSRef = BlocksSchedules[BB];
if (!BSRef) {		if (!BSRef) {
BSRef = llvm::make_unique<BlockScheduling>(BB);		BSRef = llvm::make_unique<BlockScheduling>(BB);
}		}
BlockScheduling &BS = *BSRef.get();		BlockScheduling &BS = *BSRef.get();

if (!BS.tryScheduleBundle(VL, this, VL0)) {		if (!BS.tryScheduleBundle(VL, this, S.OpValue)) {
DEBUG(dbgs() << "SLP: We are not able to schedule this bundle!\n");		DEBUG(dbgs() << "SLP: We are not able to schedule this bundle!\n");
assert((!BS.getScheduleData(VL[0]) \|\|		assert((!BS.getScheduleData(VL0) \|\|
!BS.getScheduleData(VL[0])->isPartOfBundle()) &&		!BS.getScheduleData(VL0)->isPartOfBundle()) &&
"tryScheduleBundle should cancelScheduling on failure");		"tryScheduleBundle should cancelScheduling on failure");
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
return;		return;
}		}
DEBUG(dbgs() << "SLP: We are able to schedule this bundle.\n");		DEBUG(dbgs() << "SLP: We are able to schedule this bundle.\n");

switch (Opcode) {		unsigned ShuffleOrOp = S.IsAltShuffle ?
		(unsigned) Instruction::ShuffleVector : S.Opcode;
		switch (ShuffleOrOp) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Please don't embed this in the switch() RKSimon: Please don't embed this in the switch()
		ABataevUnsubmitted Not Done Reply Inline Actions Hmm, not sure about this criteria. Maybe the better one is to have number of operations with the same or alternative opcode > than operations with different opcodes? You'd better to investigate this. ABataev: Hmm, not sure about this criteria. Maybe the better one is to have number of operations with…
case Instruction::PHI: {		case Instruction::PHI: {
PHINode *PH = dyn_cast<PHINode>(VL0);		PHINode *PH = dyn_cast<PHINode>(VL0);
		ABataevUnsubmitted Not Done Reply Inline Actions I think you can check at the very beginning of this function. ABataev: I think you can check at the very beginning of this function.

// Check for terminator values (e.g. invoke).		// Check for terminator values (e.g. invoke).
for (unsigned j = 0; j < VL.size(); ++j)		for (unsigned j = 0; j < VL.size(); ++j)
for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {		for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {
TerminatorInst *Term = dyn_cast<TerminatorInst>(		TerminatorInst *Term = dyn_cast<TerminatorInst>(
cast<PHINode>(VL[j])->getIncomingValueForBlock(PH->getIncomingBlock(i)));		cast<PHINode>(VL[j])->getIncomingValueForBlock(PH->getIncomingBlock(i)));
if (Term) {		if (Term) {
DEBUG(dbgs() << "SLP: Need to swizzle PHINodes (TerminatorInst use).\n");		DEBUG(dbgs() << "SLP: Need to swizzle PHINodes (TerminatorInst use).\n");
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
return;		return;
}		}
}		}

newTreeEntry(VL, true, UserTreeIdx);		newTreeEntry(VL, true, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: added a vector of PHINodes.\n");		DEBUG(dbgs() << "SLP: added a vector of PHINodes.\n");

for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {		for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *j : VL)		for (Value *j : VL)
Operands.push_back(cast<PHINode>(j)->getIncomingValueForBlock(		Operands.push_back(cast<PHINode>(j)->getIncomingValueForBlock(
PH->getIncomingBlock(i)));		PH->getIncomingBlock(i)));

buildTree_rec(Operands, Depth + 1, UserTreeIdx);		buildTree_rec(Operands, Depth + 1, UserTreeIdx);
}		}
return;		return;
}		}
case Instruction::ExtractValue:		case Instruction::ExtractValue:
case Instruction::ExtractElement: {		case Instruction::ExtractElement: {
bool Reuse = canReuseExtract(VL, VL0);		bool Reuse = canReuseExtract(VL, VL0);
if (Reuse) {		if (Reuse) {
DEBUG(dbgs() << "SLP: Reusing extract sequence.\n");		DEBUG(dbgs() << "SLP: Reusing extract sequence.\n");
} else {		} else {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
}		}
newTreeEntry(VL, Reuse, UserTreeIdx);		newTreeEntry(VL, Reuse, UserTreeIdx, S);
return;		return;
}		}
case Instruction::Load: {		case Instruction::Load: {
// Check that a vectorized load would load the same memory as a scalar		// Check that a vectorized load would load the same memory as a scalar
// load.		// load.
// For example we don't want vectorize loads that are smaller than 8 bit.		// For example we don't want vectorize loads that are smaller than 8 bit.
// Even though we have a packed struct {<i2, i2, i2, i2>} LLVM treats		// Even though we have a packed struct {<i2, i2, i2, i2>} LLVM treats
// loading/storing it as an i8 struct. If we vectorize loads/stores from		// loading/storing it as an i8 struct. If we vectorize loads/stores from
// such a struct we read/write packed bits disagreeing with the		// such a struct we read/write packed bits disagreeing with the
// unvectorized version.		// unvectorized version.
Type *ScalarTy = VL0->getType();		Type *ScalarTy = VL0->getType();

if (DL->getTypeSizeInBits(ScalarTy) !=		if (DL->getTypeSizeInBits(ScalarTy) !=
DL->getTypeAllocSizeInBits(ScalarTy)) {		DL->getTypeAllocSizeInBits(ScalarTy)) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: Gathering loads of non-packed type.\n");		DEBUG(dbgs() << "SLP: Gathering loads of non-packed type.\n");
return;		return;
}		}

// Make sure all loads in the bundle are simple - we can't vectorize		// Make sure all loads in the bundle are simple - we can't vectorize
// atomic or volatile loads.		// atomic or volatile loads.
for (unsigned i = 0, e = VL.size() - 1; i < e; ++i) {		for (unsigned i = 0, e = VL.size() - 1; i < e; ++i) {
LoadInst *L = cast<LoadInst>(VL[i]);		LoadInst *L = cast<LoadInst>(VL[i]);
if (!L->isSimple()) {		if (!L->isSimple()) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: Gathering non-simple loads.\n");		DEBUG(dbgs() << "SLP: Gathering non-simple loads.\n");
return;		return;
}		}
}		}

// Check if the loads are consecutive, reversed, or neither.		// Check if the loads are consecutive, reversed, or neither.
// TODO: What we really want is to sort the loads, but for now, check		// TODO: What we really want is to sort the loads, but for now, check
// the two likely directions.		// the two likely directions.
bool Consecutive = true;		bool Consecutive = true;
bool ReverseConsecutive = true;		bool ReverseConsecutive = true;
for (unsigned i = 0, e = VL.size() - 1; i < e; ++i) {		for (unsigned i = 0, e = VL.size() - 1; i < e; ++i) {
if (!isConsecutiveAccess(VL[i], VL[i + 1], DL, SE)) {		if (!isConsecutiveAccess(VL[i], VL[i + 1], DL, SE)) {
Consecutive = false;		Consecutive = false;
break;		break;
} else {		} else {
ReverseConsecutive = false;		ReverseConsecutive = false;
}		}
}		}

if (Consecutive) {		if (Consecutive) {
++NumLoadsWantToKeepOrder;		++NumLoadsWantToKeepOrder;
newTreeEntry(VL, true, UserTreeIdx);		newTreeEntry(VL, true, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: added a vector of loads.\n");		DEBUG(dbgs() << "SLP: added a vector of loads.\n");
return;		return;
}		}

// If none of the load pairs were consecutive when checked in order,		// If none of the load pairs were consecutive when checked in order,
// check the reverse order.		// check the reverse order.
if (ReverseConsecutive)		if (ReverseConsecutive)
for (unsigned i = VL.size() - 1; i > 0; --i)		for (unsigned i = VL.size() - 1; i > 0; --i)
if (!isConsecutiveAccess(VL[i], VL[i - 1], DL, SE)) {		if (!isConsecutiveAccess(VL[i], VL[i - 1], DL, SE)) {
ReverseConsecutive = false;		ReverseConsecutive = false;
break;		break;
}		}

BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);

if (ReverseConsecutive) {		if (ReverseConsecutive) {
++NumLoadsWantToChangeOrder;		++NumLoadsWantToChangeOrder;
DEBUG(dbgs() << "SLP: Gathering reversed loads.\n");		DEBUG(dbgs() << "SLP: Gathering reversed loads.\n");
} else {		} else {
DEBUG(dbgs() << "SLP: Gathering non-consecutive loads.\n");		DEBUG(dbgs() << "SLP: Gathering non-consecutive loads.\n");
}		}
return;		return;
Show All 10 Lines	switch (ShuffleOrOp) {
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
Type *SrcTy = VL0->getOperand(0)->getType();		Type *SrcTy = VL0->getOperand(0)->getType();
for (unsigned i = 0; i < VL.size(); ++i) {		for (unsigned i = 0; i < VL.size(); ++i) {
Type *Ty = cast<Instruction>(VL[i])->getOperand(0)->getType();		Type *Ty = cast<Instruction>(VL[i])->getOperand(0)->getType();
if (Ty != SrcTy \|\| !isValidElementType(Ty)) {		if (Ty != SrcTy \|\| !isValidElementType(Ty)) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: Gathering casts with different src types.\n");		DEBUG(dbgs() << "SLP: Gathering casts with different src types.\n");
return;		return;
}		}
}		}
newTreeEntry(VL, true, UserTreeIdx);		newTreeEntry(VL, true, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: added a vector of casts.\n");		DEBUG(dbgs() << "SLP: added a vector of casts.\n");

for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {		for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *j : VL)		for (Value *j : VL)
Operands.push_back(cast<Instruction>(j)->getOperand(i));		Operands.push_back(cast<Instruction>(j)->getOperand(i));

buildTree_rec(Operands, Depth + 1, UserTreeIdx);		buildTree_rec(Operands, Depth + 1, UserTreeIdx);
}		}
return;		return;
}		}
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::FCmp: {		case Instruction::FCmp: {
// Check that all of the compares have the same predicate.		// Check that all of the compares have the same predicate.
CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();		CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();
Type *ComparedTy = VL0->getOperand(0)->getType();		Type *ComparedTy = VL0->getOperand(0)->getType();
for (unsigned i = 1, e = VL.size(); i < e; ++i) {		for (unsigned i = 1, e = VL.size(); i < e; ++i) {
CmpInst *Cmp = cast<CmpInst>(VL[i]);		CmpInst *Cmp = cast<CmpInst>(VL[i]);
if (Cmp->getPredicate() != P0 \|\|		if (Cmp->getPredicate() != P0 \|\|
Cmp->getOperand(0)->getType() != ComparedTy) {		Cmp->getOperand(0)->getType() != ComparedTy) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: Gathering cmp with different predicate.\n");		DEBUG(dbgs() << "SLP: Gathering cmp with different predicate.\n");
return;		return;
}		}
}		}

newTreeEntry(VL, true, UserTreeIdx);		newTreeEntry(VL, true, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: added a vector of compares.\n");		DEBUG(dbgs() << "SLP: added a vector of compares.\n");

for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {		for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *j : VL)		for (Value *j : VL)
Operands.push_back(cast<Instruction>(j)->getOperand(i));		Operands.push_back(cast<Instruction>(j)->getOperand(i));

Show All 15 Lines	switch (ShuffleOrOp) {
case Instruction::SRem:		case Instruction::SRem:
case Instruction::FRem:		case Instruction::FRem:
case Instruction::Shl:		case Instruction::Shl:
case Instruction::LShr:		case Instruction::LShr:
case Instruction::AShr:		case Instruction::AShr:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor: {		case Instruction::Xor: {
newTreeEntry(VL, true, UserTreeIdx);		newTreeEntry(VL, true, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: added a vector of bin op.\n");		DEBUG(dbgs() << "SLP: added a vector of bin op.\n");
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions Here is the solution for the second issue related to the last commit attempt and test-case for the issue is load-dominate.ll dtemirbulatov: Here is the solution for the second issue related to the last commit attempt and test-case for…

// Sort operands of the instructions so that each side is more likely to		// Sort operands of the instructions so that each side is more likely to
// have the same opcode.		// have the same opcode.
if (isa<BinaryOperator>(VL0) && VL0->isCommutative()) {		if (isa<BinaryOperator>(VL0) && VL0->isCommutative()) {
ValueList Left, Right;		ValueList Left, Right;
reorderInputsAccordingToOpcode(VL, Left, Right);		reorderInputsAccordingToOpcode(S.Opcode, VL, Left, Right);
buildTree_rec(Left, Depth + 1, UserTreeIdx);		buildTree_rec(Left, Depth + 1, UserTreeIdx);
buildTree_rec(Right, Depth + 1, UserTreeIdx);		buildTree_rec(Right, Depth + 1, UserTreeIdx);
return;		return;
}		}

for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {		for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *j : VL)		for (Value *VecOp : VL) {
Operands.push_back(cast<Instruction>(j)->getOperand(i));		auto *I = cast<Instruction>(VecOp);
		Value *Operand;
		if (I->getOpcode() != S.Opcode) {
		assert(Instruction::isBinaryOp(S.Opcode) &&
		"Expected a binary operation.");
		Operand = isOdd(i)
		? getDefaultConstantForOpcode(S.Opcode, I->getType())
		: VecOp;
		} else
		Operand = I->getOperand(i);
		Operands.push_back(Operand);
		RKSimonUnsubmitted Not Done Reply Inline Actions Possible tidyup: if (I->getOpcode() == S.Opcode) { Operands.push_back(I->getOperand(i)); continue; } assert(Instruction::isBinaryOp(S.Opcode) && "Expected a binary operation."); Value Operand = isOdd(i) ? getDefaultConstantForOpcode(S.Opcode, I->getType()) : VecOp; Operands.push_back(Operand); RKSimon:* Possible tidyup: ``` if (I->getOpcode() == S.Opcode) { Operands.push_back(I->getOperand(i))…
		}
		RKSimonUnsubmitted Not Done Reply Inline Actions Do the j->V rename as a NFC pre-commit RKSimon: Do the j->V rename as a NFC pre-commit

buildTree_rec(Operands, Depth + 1, UserTreeIdx);		buildTree_rec(Operands, Depth + 1, UserTreeIdx);
}		}
return;		return;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
// We don't combine GEPs with complicated (nested) indexing.		// We don't combine GEPs with complicated (nested) indexing.
for (unsigned j = 0; j < VL.size(); ++j) {		for (unsigned j = 0; j < VL.size(); ++j) {
if (cast<Instruction>(VL[j])->getNumOperands() != 2) {		if (cast<Instruction>(VL[j])->getNumOperands() != 2) {
DEBUG(dbgs() << "SLP: not-vectorizable GEP (nested indexes).\n");		DEBUG(dbgs() << "SLP: not-vectorizable GEP (nested indexes).\n");
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
return;		return;
}		}
}		}

// We can't combine several GEPs into one vector if they operate on		// We can't combine several GEPs into one vector if they operate on
// different types.		// different types.
Type *Ty0 = VL0->getOperand(0)->getType();		Type *Ty0 = VL0->getOperand(0)->getType();
for (unsigned j = 0; j < VL.size(); ++j) {		for (unsigned j = 0; j < VL.size(); ++j) {
Type *CurTy = cast<Instruction>(VL[j])->getOperand(0)->getType();		Type *CurTy = cast<Instruction>(VL[j])->getOperand(0)->getType();
if (Ty0 != CurTy) {		if (Ty0 != CurTy) {
DEBUG(dbgs() << "SLP: not-vectorizable GEP (different types).\n");		DEBUG(dbgs() << "SLP: not-vectorizable GEP (different types).\n");
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
return;		return;
}		}
}		}

// We don't combine GEPs with non-constant indexes.		// We don't combine GEPs with non-constant indexes.
for (unsigned j = 0; j < VL.size(); ++j) {		for (unsigned j = 0; j < VL.size(); ++j) {
auto Op = cast<Instruction>(VL[j])->getOperand(1);		auto Op = cast<Instruction>(VL[j])->getOperand(1);
if (!isa<ConstantInt>(Op)) {		if (!isa<ConstantInt>(Op)) {
DEBUG(		DEBUG(
dbgs() << "SLP: not-vectorizable GEP (non-constant indexes).\n");		dbgs() << "SLP: not-vectorizable GEP (non-constant indexes).\n");
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
return;		return;
}		}
}		}

newTreeEntry(VL, true, UserTreeIdx);		newTreeEntry(VL, true, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: added a vector of GEPs.\n");		DEBUG(dbgs() << "SLP: added a vector of GEPs.\n");
for (unsigned i = 0, e = 2; i < e; ++i) {		for (unsigned i = 0, e = 2; i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *j : VL)		for (Value *j : VL)
Operands.push_back(cast<Instruction>(j)->getOperand(i));		Operands.push_back(cast<Instruction>(j)->getOperand(i));

buildTree_rec(Operands, Depth + 1, UserTreeIdx);		buildTree_rec(Operands, Depth + 1, UserTreeIdx);
}		}
return;		return;
}		}
case Instruction::Store: {		case Instruction::Store: {
// Check if the stores are consecutive or of we need to swizzle them.		// Check if the stores are consecutive or of we need to swizzle them.
for (unsigned i = 0, e = VL.size() - 1; i < e; ++i)		for (unsigned i = 0, e = VL.size() - 1; i < e; ++i)
if (!isConsecutiveAccess(VL[i], VL[i + 1], DL, SE)) {		if (!isConsecutiveAccess(VL[i], VL[i + 1], DL, SE)) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: Non-consecutive store.\n");		DEBUG(dbgs() << "SLP: Non-consecutive store.\n");
return;		return;
}		}

newTreeEntry(VL, true, UserTreeIdx);		newTreeEntry(VL, true, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: added a vector of stores.\n");		DEBUG(dbgs() << "SLP: added a vector of stores.\n");

ValueList Operands;		ValueList Operands;
for (Value *j : VL)		for (Value *j : VL)
Operands.push_back(cast<Instruction>(j)->getOperand(0));		Operands.push_back(cast<Instruction>(j)->getOperand(0));

buildTree_rec(Operands, Depth + 1, UserTreeIdx);		buildTree_rec(Operands, Depth + 1, UserTreeIdx);
return;		return;
}		}
case Instruction::Call: {		case Instruction::Call: {
// Check if the calls are all to the same vectorizable intrinsic.		// Check if the calls are all to the same vectorizable intrinsic.
CallInst *CI = cast<CallInst>(VL0);		CallInst *CI = cast<CallInst>(VL0);
// Check if this is an Intrinsic call or something that can be		// Check if this is an Intrinsic call or something that can be
// represented by an intrinsic call		// represented by an intrinsic call
Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
if (!isTriviallyVectorizable(ID)) {		if (!isTriviallyVectorizable(ID)) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: Non-vectorizable call.\n");		DEBUG(dbgs() << "SLP: Non-vectorizable call.\n");
return;		return;
}		}
Function *Int = CI->getCalledFunction();		Function *Int = CI->getCalledFunction();
Value *A1I = nullptr;		Value *A1I = nullptr;
if (hasVectorInstrinsicScalarOpd(ID, 1))		if (hasVectorInstrinsicScalarOpd(ID, 1))
A1I = CI->getArgOperand(1);		A1I = CI->getArgOperand(1);
for (unsigned i = 1, e = VL.size(); i != e; ++i) {		for (unsigned i = 1, e = VL.size(); i != e; ++i) {
CallInst *CI2 = dyn_cast<CallInst>(VL[i]);		CallInst *CI2 = dyn_cast<CallInst>(VL[i]);
if (!CI2 \|\| CI2->getCalledFunction() != Int \|\|		if (!CI2 \|\| CI2->getCalledFunction() != Int \|\|
getVectorIntrinsicIDForCall(CI2, TLI) != ID \|\|		getVectorIntrinsicIDForCall(CI2, TLI) != ID \|\|
!CI->hasIdenticalOperandBundleSchema(*CI2)) {		!CI->hasIdenticalOperandBundleSchema(*CI2)) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: mismatched calls:" << CI << "!=" << VL[i]		DEBUG(dbgs() << "SLP: mismatched calls:" << CI << "!=" << VL[i]
<< "\n");		<< "\n");
return;		return;
}		}
// ctlz,cttz and powi are special intrinsics whose second argument		// ctlz,cttz and powi are special intrinsics whose second argument
// should be same in order for them to be vectorized.		// should be same in order for them to be vectorized.
if (hasVectorInstrinsicScalarOpd(ID, 1)) {		if (hasVectorInstrinsicScalarOpd(ID, 1)) {
Value *A1J = CI2->getArgOperand(1);		Value *A1J = CI2->getArgOperand(1);
if (A1I != A1J) {		if (A1I != A1J) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: mismatched arguments in call:" << *CI		DEBUG(dbgs() << "SLP: mismatched arguments in call:" << *CI
<< " argument "<< A1I<<"!=" << A1J		<< " argument "<< A1I<<"!=" << A1J
<< "\n");		<< "\n");
return;		return;
}		}
}		}
// Verify that the bundle operands are identical between the two calls.		// Verify that the bundle operands are identical between the two calls.
if (CI->hasOperandBundles() &&		if (CI->hasOperandBundles() &&
!std::equal(CI->op_begin() + CI->getBundleOperandsStartIndex(),		!std::equal(CI->op_begin() + CI->getBundleOperandsStartIndex(),
CI->op_begin() + CI->getBundleOperandsEndIndex(),		CI->op_begin() + CI->getBundleOperandsEndIndex(),
CI2->op_begin() + CI2->getBundleOperandsStartIndex())) {		CI2->op_begin() + CI2->getBundleOperandsStartIndex())) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: mismatched bundle operands in calls:" << *CI << "!="		DEBUG(dbgs() << "SLP: mismatched bundle operands in calls:" << *CI << "!="
<< *VL[i] << '\n');		<< *VL[i] << '\n');
return;		return;
}		}
}		}

newTreeEntry(VL, true, UserTreeIdx);		newTreeEntry(VL, true, UserTreeIdx, S);
for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i) {		for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *j : VL) {		for (Value *j : VL) {
CallInst *CI2 = dyn_cast<CallInst>(j);		CallInst *CI2 = dyn_cast<CallInst>(j);
Operands.push_back(CI2->getArgOperand(i));		Operands.push_back(CI2->getArgOperand(i));
}		}
buildTree_rec(Operands, Depth + 1, UserTreeIdx);		buildTree_rec(Operands, Depth + 1, UserTreeIdx);
}		}
return;		return;
}		}
case Instruction::ShuffleVector: {		case Instruction::ShuffleVector: {
// If this is not an alternate sequence of opcode like add-sub		// If this is not an alternate sequence of opcode like add-sub
// then do not vectorize this instruction.		// then do not vectorize this instruction.
if (!isAltShuffle) {		if (!S.IsAltShuffle) {
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: ShuffleVector are not vectorized.\n");		DEBUG(dbgs() << "SLP: ShuffleVector are not vectorized.\n");
return;		return;
}		}
newTreeEntry(VL, true, UserTreeIdx);		newTreeEntry(VL, true, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: added a ShuffleVector op.\n");		DEBUG(dbgs() << "SLP: added a ShuffleVector op.\n");

// Reorder operands if reordering would enable vectorization.		// Reorder operands if reordering would enable vectorization.
if (isa<BinaryOperator>(VL0)) {		if (isa<BinaryOperator>(VL0)) {
ValueList Left, Right;		ValueList Left, Right;
reorderAltShuffleOperands(VL, Left, Right);		reorderAltShuffleOperands(S.Opcode, VL, Left, Right);
buildTree_rec(Left, Depth + 1, UserTreeIdx);		buildTree_rec(Left, Depth + 1, UserTreeIdx);
buildTree_rec(Right, Depth + 1, UserTreeIdx);		buildTree_rec(Right, Depth + 1, UserTreeIdx);
return;		return;
}		}

for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {		for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *j : VL)		for (Value *VecOp : VL) {
Operands.push_back(cast<Instruction>(j)->getOperand(i));		auto *I = cast<Instruction>(VecOp);
		Value *Operand;
		if (!sameOpcodeOrAlt(S.Opcode, AltOpcode, I->getOpcode())) {
		assert(Instruction::isBinaryOp(S.Opcode) &&
		RKSimonUnsubmitted Not Done Reply Inline Actions Always add an assert message. RKSimon: Always add an assert message.
		"Expected a binary operation.");
		Operand = isOdd(i)
		? getDefaultConstantForOpcode(S.Opcode, I->getType())
		: VecOp;
		} else
		Operand = I->getOperand(i);
		Operands.push_back(Operand);
		RKSimonUnsubmitted Not Done Reply Inline Actions Tidyup similar to above? RKSimon: Tidyup similar to above?
		}

buildTree_rec(Operands, Depth + 1, UserTreeIdx);		buildTree_rec(Operands, Depth + 1, UserTreeIdx);
}		}
return;		return;
}		}
default:		default:
BS.cancelScheduling(VL, VL0);		BS.cancelScheduling(VL, VL0);
newTreeEntry(VL, false, UserTreeIdx);		newTreeEntry(VL, false, UserTreeIdx, S);
DEBUG(dbgs() << "SLP: Gathering unknown instruction.\n");		DEBUG(dbgs() << "SLP: Gathering unknown instruction.\n");
return;		return;
}		}
}		}

unsigned BoUpSLP::canMapToVector(Type *T, const DataLayout &DL) const {		unsigned BoUpSLP::canMapToVector(Type *T, const DataLayout &DL) const {
unsigned N;		unsigned N;
Type *EltTy;		Type *EltTy;
Show All 18 Lines	unsigned BoUpSLP::canMapToVector(Type *T, const DataLayout &DL) const {
}		}
return N;		return N;
}		}

bool BoUpSLP::canReuseExtract(ArrayRef<Value > VL, Value OpValue) const {		bool BoUpSLP::canReuseExtract(ArrayRef<Value > VL, Value OpValue) const {
Instruction *E0 = cast<Instruction>(OpValue);		Instruction *E0 = cast<Instruction>(OpValue);
assert(E0->getOpcode() == Instruction::ExtractElement \|\|		assert(E0->getOpcode() == Instruction::ExtractElement \|\|
E0->getOpcode() == Instruction::ExtractValue);		E0->getOpcode() == Instruction::ExtractValue);
assert(E0->getOpcode() == getSameOpcode(VL) && "Invalid opcode");		assert(E0->getOpcode() == getSameOpcode(VL).Opcode && "Invalid opcode");
// Check if all of the extracts come from the same vector and from the		// Check if all of the extracts come from the same vector and from the
// correct offset.		// correct offset.
Value *Vec = E0->getOperand(0);		Value *Vec = E0->getOperand(0);

// We have to extract from a vector/aggregate with the same number of elements.		// We have to extract from a vector/aggregate with the same number of elements.
unsigned NElts;		unsigned NElts;
if (E0->getOpcode() == Instruction::ExtractValue) {		if (E0->getOpcode() == Instruction::ExtractValue) {
const DataLayout &DL = E0->getModule()->getDataLayout();		const DataLayout &DL = E0->getModule()->getDataLayout();
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	int BoUpSLP::getEntryCost(TreeEntry *E) {
if (E->NeedToGather) {		if (E->NeedToGather) {
if (allConstant(VL))		if (allConstant(VL))
return 0;		return 0;
if (isSplat(VL)) {		if (isSplat(VL)) {
return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy, 0);		return TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy, 0);
}		}
return getGatherCost(E->Scalars);		return getGatherCost(E->Scalars);
}		}
unsigned Opcode = getSameOpcode(VL);		assert(E->State.Opcode && allSameType(VL) && allSameBlock(VL) && "Invalid VL");
		RKSimonUnsubmitted Not Done Reply Inline Actions auto VL0 RKSimon:* auto *VL0
assert(Opcode && allSameType(VL) && allSameBlock(VL) && "Invalid VL");		Instruction *VL0 = cast<Instruction>(E->State.OpValue);
Instruction *VL0 = cast<Instruction>(VL[0]);		unsigned ShuffleOrOp = E->State.IsAltShuffle ?
switch (Opcode) {		(unsigned) Instruction::ShuffleVector : E->State.Opcode;
		RKSimonUnsubmitted Not Done Reply Inline Actions Please don't embed this in the switch() RKSimon: Please don't embed this in the switch()
		switch (ShuffleOrOp) {
case Instruction::PHI: {		case Instruction::PHI: {
return 0;		return 0;
}		}
case Instruction::ExtractValue:		case Instruction::ExtractValue:
case Instruction::ExtractElement: {		case Instruction::ExtractElement: {
if (canReuseExtract(VL, VL0)) {		if (canReuseExtract(VL, E->State.OpValue)) {
		ABataevUnsubmitted Not Done Reply Inline Actions You can use `VL0` instead of `E->State.OpValue` ABataev: You can use `VL0` instead of `E->State.OpValue`
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions done. dtemirbulatov: done.
int DeadCost = 0;		int DeadCost = 0;
for (unsigned i = 0, e = VL.size(); i < e; ++i) {		for (unsigned i = 0, e = VL.size(); i < e; ++i) {
Instruction *E = cast<Instruction>(VL[i]);		Instruction *E = cast<Instruction>(VL[i]);
// If all users are going to be vectorized, instruction can be		// If all users are going to be vectorized, instruction can be
// considered as dead.		// considered as dead.
// The same, if have only one user, it will be vectorized for sure.		// The same, if have only one user, it will be vectorized for sure.
if (E->hasOneUse() \|\|		if (E->hasOneUse() \|\|
std::all_of(E->user_begin(), E->user_end(), [this](User *U) {		std::all_of(E->user_begin(), E->user_end(), [this](User *U) {
Show All 30 Lines	case Instruction::BitCast: {
return VecCost - ScalarCost;		return VecCost - ScalarCost;
}		}
case Instruction::FCmp:		case Instruction::FCmp:
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::Select: {		case Instruction::Select: {
// Calculate the cost of this instruction.		// Calculate the cost of this instruction.
VectorType *MaskTy = VectorType::get(Builder.getInt1Ty(), VL.size());		VectorType *MaskTy = VectorType::get(Builder.getInt1Ty(), VL.size());
int ScalarCost = VecTy->getNumElements() *		int ScalarCost = VecTy->getNumElements() *
TTI->getCmpSelInstrCost(Opcode, ScalarTy, Builder.getInt1Ty(), VL0);		TTI->getCmpSelInstrCost(ShuffleOrOp, ScalarTy, Builder.getInt1Ty(), VL0);
int VecCost = TTI->getCmpSelInstrCost(Opcode, VecTy, MaskTy, VL0);		int VecCost = TTI->getCmpSelInstrCost(ShuffleOrOp, VecTy, MaskTy, VL0);
return VecCost - ScalarCost;		return VecCost - ScalarCost;
}		}
case Instruction::Add:		case Instruction::Add:
case Instruction::FAdd:		case Instruction::FAdd:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::FSub:		case Instruction::FSub:
case Instruction::Mul:		case Instruction::Mul:
case Instruction::FMul:		case Instruction::FMul:
Show All 20 Lines	case Instruction::Xor: {
TargetTransformInfo::OperandValueProperties Op2VP =		TargetTransformInfo::OperandValueProperties Op2VP =
TargetTransformInfo::OP_None;		TargetTransformInfo::OP_None;

// If all operands are exactly the same ConstantInt then set the		// If all operands are exactly the same ConstantInt then set the
// operand kind to OK_UniformConstantValue.		// operand kind to OK_UniformConstantValue.
// If instead not all operands are constants, then set the operand kind		// If instead not all operands are constants, then set the operand kind
// to OK_AnyValue. If all operands are constants but not the same,		// to OK_AnyValue. If all operands are constants but not the same,
// then set the operand kind to OK_NonUniformConstantValue.		// then set the operand kind to OK_NonUniformConstantValue.
ConstantInt *CInt = nullptr;		if (auto *CInt = dyn_cast<ConstantInt>(VL0->getOperand(1))) {
for (unsigned i = 0; i < VL.size(); ++i) {		const unsigned Opcode = E->State.Opcode;
const Instruction *I = cast<Instruction>(VL[i]);		for (auto *V : VL) {
		auto *I = cast<Instruction>(V);
		if (I == VL0 \|\| Opcode != I->getOpcode())
		continue;
if (!isa<ConstantInt>(I->getOperand(1))) {		if (!isa<ConstantInt>(I->getOperand(1))) {
Op2VK = TargetTransformInfo::OK_AnyValue;		Op2VK = TargetTransformInfo::OK_AnyValue;
break;		break;
}		}
if (i == 0) {
CInt = cast<ConstantInt>(I->getOperand(1));
continue;
}
if (Op2VK == TargetTransformInfo::OK_UniformConstantValue &&		if (Op2VK == TargetTransformInfo::OK_UniformConstantValue &&
CInt != cast<ConstantInt>(I->getOperand(1)))		CInt != cast<ConstantInt>(I->getOperand(1)))
Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;		Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;
}		}
// FIXME: Currently cost of model modification for division by power of		// FIXME: Currently cost of model modification for division by power of
// 2 is handled for X86 and AArch64. Add support for other targets.		// 2 is handled for X86 and AArch64. Add support for other targets.
if (Op2VK == TargetTransformInfo::OK_UniformConstantValue && CInt &&		if (Op2VK == TargetTransformInfo::OK_UniformConstantValue &&
CInt->getValue().isPowerOf2())		CInt->getValue().isPowerOf2())
Op2VP = TargetTransformInfo::OP_PowerOf2;		Op2VP = TargetTransformInfo::OP_PowerOf2;
		} else
		Op2VK = TargetTransformInfo::OK_AnyValue;
		RKSimonUnsubmitted Not Done Reply Inline Actions This setting of Op2VK is a bit confusing with dangling if-else cases - why not set to OK_AnyValue by default and try to make it OK_UniformConstantValue (maybe just at the start of the 'if (auto CInt = dyn_cast<ConstantInt>(VL0->getOperand(1))) {') ? RKSimon:* This setting of Op2VK is a bit confusing with dangling if-else cases - why not set to…

SmallVector<const Value *, 4> Operands(VL0->operand_values());		int ScalarCost = VecTy->getNumElements() *
int ScalarCost =		TTI->getArithmeticInstrCost(E->State.Opcode, ScalarTy,
VecTy->getNumElements() *		Op1VK, Op2VK, Op1VP, Op2VP);
TTI->getArithmeticInstrCost(Opcode, ScalarTy, Op1VK, Op2VK, Op1VP,		int VecCost = TTI->getArithmeticInstrCost(E->State.Opcode, VecTy, Op1VK,
Op2VP, Operands);		Op2VK, Op1VP, Op2VP);
int VecCost = TTI->getArithmeticInstrCost(Opcode, VecTy, Op1VK, Op2VK,
Op1VP, Op2VP, Operands);
return VecCost - ScalarCost;		return VecCost - ScalarCost;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
TargetTransformInfo::OperandValueKind Op1VK =		TargetTransformInfo::OperandValueKind Op1VK =
TargetTransformInfo::OK_AnyValue;		TargetTransformInfo::OK_AnyValue;
TargetTransformInfo::OperandValueKind Op2VK =		TargetTransformInfo::OperandValueKind Op2VK =
TargetTransformInfo::OK_UniformConstantValue;		TargetTransformInfo::OK_UniformConstantValue;

▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	case Instruction::Call: {

return VecCallCost - ScalarCallCost;		return VecCallCost - ScalarCallCost;
}		}
case Instruction::ShuffleVector: {		case Instruction::ShuffleVector: {
TargetTransformInfo::OperandValueKind Op1VK =		TargetTransformInfo::OperandValueKind Op1VK =
TargetTransformInfo::OK_AnyValue;		TargetTransformInfo::OK_AnyValue;
TargetTransformInfo::OperandValueKind Op2VK =		TargetTransformInfo::OperandValueKind Op2VK =
TargetTransformInfo::OK_AnyValue;		TargetTransformInfo::OK_AnyValue;
int ScalarCost = 0;		unsigned AltOpcode = getAltOpcode(E->State.Opcode);
int VecCost = 0;		int ScalarCost =
for (Value *i : VL) {		TTI->getArithmeticInstrCost(E->State.Opcode, ScalarTy, Op1VK, Op2VK) *
Instruction *I = cast<Instruction>(i);		VL.size() / 2;
if (!I)
break;
ScalarCost +=		ScalarCost +=
TTI->getArithmeticInstrCost(I->getOpcode(), ScalarTy, Op1VK, Op2VK);		TTI->getArithmeticInstrCost(AltOpcode, ScalarTy, Op1VK, Op2VK) *
}		VL.size() / 2;
// VecCost is equal to sum of the cost of creating 2 vectors		// VecCost is equal to sum of the cost of creating 2 vectors
// and the cost of creating shuffle.		// and the cost of creating shuffle.
Instruction *I0 = cast<Instruction>(VL[0]);		int VecCost =
VecCost =		TTI->getArithmeticInstrCost(E->State.Opcode, VecTy, Op1VK, Op2VK);
TTI->getArithmeticInstrCost(I0->getOpcode(), VecTy, Op1VK, Op2VK);		VecCost += TTI->getArithmeticInstrCost(AltOpcode, VecTy, Op1VK, Op2VK);
Instruction *I1 = cast<Instruction>(VL[1]);
VecCost +=
TTI->getArithmeticInstrCost(I1->getOpcode(), VecTy, Op1VK, Op2VK);
VecCost +=		VecCost +=
TTI->getShuffleCost(TargetTransformInfo::SK_Alternate, VecTy, 0);		TTI->getShuffleCost(TargetTransformInfo::SK_Alternate, VecTy, 0);
return VecCost - ScalarCost;		return VecCost - ScalarCost;
}		}
default:		default:
llvm_unreachable("Unknown instruction");		llvm_unreachable("Unknown instruction");
}		}
}		}
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	int BoUpSLP::getSpillCost() {
// (for example, if spills and fills are required).		// (for example, if spills and fills are required).
unsigned BundleWidth = VectorizableTree.front().Scalars.size();		unsigned BundleWidth = VectorizableTree.front().Scalars.size();
int Cost = 0;		int Cost = 0;

SmallPtrSet<Instruction*, 4> LiveValues;		SmallPtrSet<Instruction*, 4> LiveValues;
Instruction *PrevInst = nullptr;		Instruction *PrevInst = nullptr;

for (const auto &N : VectorizableTree) {		for (const auto &N : VectorizableTree) {
Instruction *Inst = dyn_cast<Instruction>(N.Scalars[0]);		Instruction *Inst = dyn_cast<Instruction>(N.State.OpValue);
		ABataevUnsubmitted Not Done Reply Inline Actions `auto Inst` ABataev:* `auto *Inst`
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions done. dtemirbulatov: done.
if (!Inst)		if (!Inst)
continue;		continue;

if (!PrevInst) {		if (!PrevInst) {
PrevInst = Inst;		PrevInst = Inst;
continue;		continue;
}		}

▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	int BoUpSLP::getTreeCost() {
DEBUG(dbgs() << "SLP: Calculating cost for tree of size " <<		DEBUG(dbgs() << "SLP: Calculating cost for tree of size " <<
VectorizableTree.size() << ".\n");		VectorizableTree.size() << ".\n");

unsigned BundleWidth = VectorizableTree[0].Scalars.size();		unsigned BundleWidth = VectorizableTree[0].Scalars.size();

for (TreeEntry &TE : VectorizableTree) {		for (TreeEntry &TE : VectorizableTree) {
int C = getEntryCost(&TE);		int C = getEntryCost(&TE);
DEBUG(dbgs() << "SLP: Adding cost " << C << " for bundle that starts with "		DEBUG(dbgs() << "SLP: Adding cost " << C << " for bundle that starts with "
<< *TE.Scalars[0] << ".\n");		<< *TE.State.OpValue << ".\n");
Cost += C;		Cost += C;
}		}

SmallSet<Value *, 16> ExtractCostCalculated;		SmallSet<Value *, 16> ExtractCostCalculated;
int ExtractCost = 0;		int ExtractCost = 0;
for (ExternalUser &EU : ExternalUses) {		for (ExternalUser &EU : ExternalUses) {
// We only add extract cost once for the same scalar.		// We only add extract cost once for the same scalar.
if (!ExtractCostCalculated.insert(EU.Scalar).second)		if (!ExtractCostCalculated.insert(EU.Scalar).second)
continue;		continue;

// Uses by ephemeral values are free (because the ephemeral value will be		// Uses by ephemeral values are free (because the ephemeral value will be
// removed prior to code generation, and so the extraction will be		// removed prior to code generation, and so the extraction will be
// removed as well).		// removed as well).
if (EphValues.count(EU.User))		if (EphValues.count(EU.User))
continue;		continue;

// If we plan to rewrite the tree in a smaller type, we will need to sign		// If we plan to rewrite the tree in a smaller type, we will need to sign
// extend the extracted value back to the original type. Here, we account		// extend the extracted value back to the original type. Here, we account
// for the extract and the added cost of the sign extend if needed.		// for the extract and the added cost of the sign extend if needed.
auto *VecTy = VectorType::get(EU.Scalar->getType(), BundleWidth);		auto *VecTy = VectorType::get(EU.Scalar->getType(), BundleWidth);
auto *ScalarRoot = VectorizableTree[0].Scalars[0];		auto *ScalarRoot = VectorizableTree[0].State.OpValue;
if (MinBWs.count(ScalarRoot)) {		if (MinBWs.count(ScalarRoot)) {
auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);		auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);
auto Extend =		auto Extend =
MinBWs[ScalarRoot].second ? Instruction::SExt : Instruction::ZExt;		MinBWs[ScalarRoot].second ? Instruction::SExt : Instruction::ZExt;
VecTy = VectorType::get(MinTy, BundleWidth);		VecTy = VectorType::get(MinTy, BundleWidth);
ExtractCost += TTI->getExtractWithExtendCost(Extend, EU.Scalar->getType(),		ExtractCost += TTI->getExtractWithExtendCost(Extend, EU.Scalar->getType(),
VecTy, EU.Lane);		VecTy, EU.Lane);
} else {		} else {
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
// are consecutive loads. This would allow us to vectorize the tree.		// are consecutive loads. This would allow us to vectorize the tree.
// If we have something like-		// If we have something like-
// load a[0] - load b[0]		// load a[0] - load b[0]
// load b[1] + load a[1]		// load b[1] + load a[1]
// load a[2] - load b[2]		// load a[2] - load b[2]
// load a[3] + load b[3]		// load a[3] + load b[3]
// Reordering the second load b[1] load a[1] would allow us to vectorize this		// Reordering the second load b[1] load a[1] would allow us to vectorize this
// code.		// code.
void BoUpSLP::reorderAltShuffleOperands(ArrayRef<Value *> VL,		void BoUpSLP::reorderAltShuffleOperands(unsigned Opcode, ArrayRef<Value *> VL,
SmallVectorImpl<Value *> &Left,		SmallVectorImpl<Value *> &Left,
SmallVectorImpl<Value *> &Right) {		SmallVectorImpl<Value *> &Right) {
// Push left and right operands of binary operation into Left and Right		// Push left and right operands of binary operation into Left and Right
for (Value *i : VL) {		unsigned AltOpcode = getAltOpcode(Opcode);
Left.push_back(cast<Instruction>(i)->getOperand(0));		for (Value *V : VL) {
Right.push_back(cast<Instruction>(i)->getOperand(1));		auto *I = cast<Instruction>(V);
		if (sameOpcodeOrAlt(Opcode, AltOpcode, I->getOpcode())) {
		Left.push_back(I->getOperand(0));
		Right.push_back(I->getOperand(1));
		} else {
		Left.push_back(I);
		Right.push_back(getDefaultConstantForOpcode(Opcode, I->getType()));
		}
}		}

// Reorder if we have a commutative operation and consecutive access		// Reorder if we have a commutative operation and consecutive access
// are on either side of the alternate instructions.		// are on either side of the alternate instructions.
for (unsigned j = 0; j < VL.size() - 1; ++j) {		for (unsigned j = 0; j < VL.size() - 1; ++j) {
if (LoadInst *L = dyn_cast<LoadInst>(Left[j])) {		if (LoadInst *L = dyn_cast<LoadInst>(Left[j])) {
if (LoadInst *L1 = dyn_cast<LoadInst>(Right[j + 1])) {		if (LoadInst *L1 = dyn_cast<LoadInst>(Right[j + 1])) {
Instruction *VL1 = cast<Instruction>(VL[j]);		Instruction *VL1 = cast<Instruction>(VL[j]);
Show All 28 Lines
}		}

// Return true if I should be commuted before adding it's left and right		// Return true if I should be commuted before adding it's left and right
// operands to the arrays Left and Right.		// operands to the arrays Left and Right.
//		//
// The vectorizer is trying to either have all elements one side being		// The vectorizer is trying to either have all elements one side being
// instruction with the same opcode to enable further vectorization, or having		// instruction with the same opcode to enable further vectorization, or having
// a splat to lower the vectorizing cost.		// a splat to lower the vectorizing cost.
static bool shouldReorderOperands(int i, Instruction &I,		static bool shouldReorderOperands(
SmallVectorImpl<Value *> &Left,		int i, unsigned Opcode, Instruction &I, ArrayRef<Value *> Left,
		RKSimonUnsubmitted Not Done Reply Inline Actions The i ->Idx change can be pulled out of a NFC pre-commit. RKSimon: The i ->Idx change can be pulled out of a NFC pre-commit.
SmallVectorImpl<Value *> &Right,		ArrayRef<Value *> Right, bool AllSameOpcodeLeft, bool AllSameOpcodeRight,
bool AllSameOpcodeLeft,		bool SplatLeft, bool SplatRight, Value &VLeft, Value &VRight) {
bool AllSameOpcodeRight, bool SplatLeft,		if (I.getOpcode() == Opcode) {
bool SplatRight) {		VLeft = I.getOperand(0);
Value *VLeft = I.getOperand(0);		VRight = I.getOperand(1);
Value *VRight = I.getOperand(1);		} else {
		VLeft = &I;
		VRight = getDefaultConstantForOpcode(Opcode, I.getType());
		}
// If we have "SplatRight", try to see if commuting is needed to preserve it.		// If we have "SplatRight", try to see if commuting is needed to preserve it.
if (SplatRight) {		if (SplatRight) {
if (VRight == Right[i - 1])		if (VRight == Right[i - 1])
// Preserve SplatRight		// Preserve SplatRight
return false;		return false;
if (VLeft == Right[i - 1]) {		if (VLeft == Right[i - 1]) {
// Commuting would preserve SplatRight, but we don't want to break		// Commuting would preserve SplatRight, but we don't want to break
// SplatLeft either, i.e. preserve the original order if possible.		// SplatLeft either, i.e. preserve the original order if possible.
Show All 39 Lines	if (AllSameOpcodeLeft) {
if (ILeft && LeftPrevOpcode == ILeft->getOpcode())		if (ILeft && LeftPrevOpcode == ILeft->getOpcode())
return false;		return false;
if (IRight && LeftPrevOpcode == IRight->getOpcode())		if (IRight && LeftPrevOpcode == IRight->getOpcode())
return true;		return true;
}		}
return false;		return false;
}		}

void BoUpSLP::reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,		void BoUpSLP::reorderInputsAccordingToOpcode(unsigned Opcode,
		ArrayRef<Value *> VL,
SmallVectorImpl<Value *> &Left,		SmallVectorImpl<Value *> &Left,
SmallVectorImpl<Value *> &Right) {		SmallVectorImpl<Value *> &Right) {

if (VL.size()) {		if (VL.size()) {
// Peel the first iteration out of the loop since there's nothing		// Peel the first iteration out of the loop since there's nothing
// interesting to do anyway and it simplifies the checks in the loop.		// interesting to do anyway and it simplifies the checks in the loop.
auto VLeft = cast<Instruction>(VL[0])->getOperand(0);		auto *I = cast<Instruction>(VL[0]);
auto VRight = cast<Instruction>(VL[0])->getOperand(1);		Value *VLeft;
		Value *VRight;
		if (I->getOpcode() == Opcode) {
		VLeft = I->getOperand(0);
		VRight = I->getOperand(1);
		} else {
		VLeft = I;
		VRight = getDefaultConstantForOpcode(Opcode, I->getType());
		}
if (!isa<Instruction>(VRight) && isa<Instruction>(VLeft))		if (!isa<Instruction>(VRight) && isa<Instruction>(VLeft))
// Favor having instruction to the right. FIXME: why?		// Favor having instruction to the right. FIXME: why?
std::swap(VLeft, VRight);		std::swap(VLeft, VRight);
Left.push_back(VLeft);		Left.push_back(VLeft);
Right.push_back(VRight);		Right.push_back(VRight);
}		}

// Keep track if we have instructions with all the same opcode on one side.		// Keep track if we have instructions with all the same opcode on one side.
bool AllSameOpcodeLeft = isa<Instruction>(Left[0]);		bool AllSameOpcodeLeft = isa<Instruction>(Left[0]);
bool AllSameOpcodeRight = isa<Instruction>(Right[0]);		bool AllSameOpcodeRight = isa<Instruction>(Right[0]);
// Keep track if we have one side with all the same value (broadcast).		// Keep track if we have one side with all the same value (broadcast).
bool SplatLeft = true;		bool SplatLeft = true;
bool SplatRight = true;		bool SplatRight = true;

for (unsigned i = 1, e = VL.size(); i != e; ++i) {		for (unsigned i = 1, e = VL.size(); i != e; ++i) {
Instruction *I = cast<Instruction>(VL[i]);		Instruction *I = cast<Instruction>(VL[i]);
assert(I->isCommutative() && "Can only process commutative instruction");		assert(((I->getOpcode() == Opcode && I->isCommutative()) \|\|
		(I->getOpcode() != Opcode && Instruction::isCommutative(Opcode))) &&
		"Can only process commutative instruction");
// Commute to favor either a splat or maximizing having the same opcodes on		// Commute to favor either a splat or maximizing having the same opcodes on
// one side.		// one side.
if (shouldReorderOperands(i, *I, Left, Right, AllSameOpcodeLeft,		Value *VLeft;
AllSameOpcodeRight, SplatLeft, SplatRight)) {		Value *VRight;
Left.push_back(I->getOperand(1));		if (shouldReorderOperands(i, Opcode, *I, Left, Right, AllSameOpcodeLeft,
Right.push_back(I->getOperand(0));		AllSameOpcodeRight, SplatLeft, SplatRight, VLeft,
		VRight)) {
		Left.push_back(VRight);
		Right.push_back(VLeft);
} else {		} else {
Left.push_back(I->getOperand(0));		Left.push_back(VLeft);
Right.push_back(I->getOperand(1));		Right.push_back(VRight);
}		}
// Update Splat* and AllSameOpcode* after the insertion.		// Update Splat* and AllSameOpcode* after the insertion.
SplatRight = SplatRight && (Right[i - 1] == Right[i]);		SplatRight = SplatRight && (Right[i - 1] == Right[i]);
SplatLeft = SplatLeft && (Left[i - 1] == Left[i]);		SplatLeft = SplatLeft && (Left[i - 1] == Left[i]);
AllSameOpcodeLeft = AllSameOpcodeLeft && isa<Instruction>(Left[i]) &&		AllSameOpcodeLeft = AllSameOpcodeLeft && isa<Instruction>(Left[i]) &&
(cast<Instruction>(Left[i - 1])->getOpcode() ==		(cast<Instruction>(Left[i - 1])->getOpcode() ==
cast<Instruction>(Left[i])->getOpcode());		cast<Instruction>(Left[i])->getOpcode());
AllSameOpcodeRight = AllSameOpcodeRight && isa<Instruction>(Right[i]) &&		AllSameOpcodeRight = AllSameOpcodeRight && isa<Instruction>(Right[i]) &&
Show All 36 Lines	if (LoadInst *L = dyn_cast<LoadInst>(Right[j])) {
continue;		continue;
}		}
}		}
}		}
// else unchanged		// else unchanged
}		}
}		}

void BoUpSLP::setInsertPointAfterBundle(ArrayRef<Value *> VL) {		void BoUpSLP::setInsertPointAfterBundle(ArrayRef<Value > VL, Value OpValue) {

// Get the basic block this bundle is in. All instructions in the bundle		// Get the basic block this bundle is in. All instructions in the bundle
// should be in this block.		// should be in this block.
auto *Front = cast<Instruction>(VL.front());		auto *Front = cast<Instruction>(OpValue);
auto *BB = Front->getParent();		auto *BB = Front->getParent();
assert(all_of(make_range(VL.begin(), VL.end()), [&](Value *V) -> bool {		const unsigned Opcode = cast<Instruction>(OpValue)->getOpcode();
return cast<Instruction>(V)->getParent() == BB;		const unsigned AltOpcode = getAltOpcode(Opcode);
		assert(all_of(make_range(VL.begin(), VL.end()), [=](Value *V) -> bool {
		return !sameOpcodeOrAlt(Opcode, AltOpcode,
		cast<Instruction>(V)->getOpcode()) \|\|
		cast<Instruction>(V)->getParent() == BB;
}));		}));

// The last instruction in the bundle in program order.		// The last instruction in the bundle in program order.
Instruction *LastInst = nullptr;		Instruction *LastInst = nullptr;

// Find the last instruction. The common case should be that BB has been		// Find the last instruction. The common case should be that BB has been
// scheduled, and the last instruction is VL.back(). So we start with		// scheduled, and the last instruction is VL.back(). So we start with
// VL.back() and iterate over schedule data until we reach the end of the		// VL.back() and iterate over schedule data until we reach the end of the
// bundle. The end of the bundle is marked by null ScheduleData.		// bundle. The end of the bundle is marked by null ScheduleData.
if (BlocksSchedules.count(BB)) {		if (BlocksSchedules.count(BB)) {
auto *Bundle = BlocksSchedules[BB]->getScheduleData(VL.back());		auto *Bundle =
		BlocksSchedules[BB]->getScheduleData(isOneOf(OpValue, VL.back()));
if (Bundle && Bundle->isPartOfBundle())		if (Bundle && Bundle->isPartOfBundle())
for (; Bundle; Bundle = Bundle->NextInBundle)		for (; Bundle; Bundle = Bundle->NextInBundle)
		if (Bundle->OpValue == Bundle->Inst)
LastInst = Bundle->Inst;		LastInst = Bundle->Inst;
}		}
		ABataevUnsubmitted Not Done Reply Inline Actions Why you can't put bundles in the list in the right order: from the very first instruction to the very last? ABataev: Why you can't put bundles in the list in the right order: from the very first instruction to…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions I could do this in scheduleBlock() function with a queue, but that could add additional complexity. dtemirbulatov: I could do this in scheduleBlock() function with a queue, but that could add additional…
		ABataevUnsubmitted Not Done Reply Inline Actions Could you explain why? ABataev: Could you explain why?
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions One instruction could belong to one or more separate bundles... and while we try to change order in bundles at scheduleBlock() we have to update ScheduleDataMap, ExtraScheduleDataMap. dtemirbulatov: One instruction could belong to one or more separate bundles... and while we try to change…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions I mean pseudo operation could occur in more than one bundle. dtemirbulatov: I mean pseudo operation could occur in more than one bundle.
		ABataevUnsubmitted Not Done Reply Inline Actions But these schedule bundles must have different scheduling region id and they must be in a different bundles, why their order changes? ABataev: But these schedule bundles must have different scheduling region id and they must be in a…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions The bundle is differerent, but scheduling region id is the same. dtemirbulatov: The bundle is differerent, but scheduling region id is the same.
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions I mean, for example, for this function: define void @add0(i32* noalias %dst, i32* noalias %src) { entry: %incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1 %0 = load i32, i32* %src, align 4 %add = add nsw i32 %0, 1 %incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1 store i32 %add, i32* %dst, align 4 %incdec.ptr2 = getelementptr inbounds i32, i32* %src, i64 2 %1 = load i32, i32* %incdec.ptr, align 4 %add3 = add nsw i32 %1, 1 %incdec.ptr4 = getelementptr inbounds i32, i32* %dst, i64 2 store i32 %add3, i32* %incdec.ptr1, align 4 %incdec.ptr5 = getelementptr inbounds i32, i32* %src, i64 3 %2 = load i32, i32* %incdec.ptr2, align 4 %add6 = add nsw i32 %2, 2 %incdec.ptr7 = getelementptr inbounds i32, i32* %dst, i64 3 store i32 %add6, i32* %incdec.ptr4, align 4 %3 = load i32, i32* %incdec.ptr5, align 4 %add9 = add nsw i32 %3, 3 store i32 %add9, i32* %incdec.ptr7, align 4 ret void } We have two bundles: [ %3 = load i32, i32* %src, align 4; %add3 = add nsw i32 %2, 1; %add6 = add nsw i32 %1, 2; %add9 = add nsw i32 %0, 3] and [ %3 = load i32, i32* %src, align 4; %2 = load i32, i32* %incdec.ptr, align 4; %1 = load i32, i32* %incdec.ptr2, align 4; %0 = load i32, i32* %incdec.ptr5, align 4] with the same instruction %3 = load i32, i32* %src, align 4 and one is a pseudo instruction in this bundle [ %3 = load i32, i32* %src, align 4; %add3; %add6; %add9] and all in the same scheduling region id that equal to 1. dtemirbulatov: I mean, for example, for this function: define void @add0(i32* noalias %dst, i32* noalias %src)…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions why their order changes? sometimes we have to reschedule a pseudo instruction first in both bundles in order to form correct dependencies. dtemirbulatov: >why their order changes? sometimes we have to reschedule a pseudo instruction first in both…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions I will move this solution to a dedicated function, so we don't have to measure distance here. dtemirbulatov: I will move this solution to a dedicated function, so we don't have to measure distance here.

// LastInst can still be null at this point if there's either not an entry		// LastInst can still be null at this point if there's either not an entry
// for BB in BlocksSchedules or there's no ScheduleData available for		// for BB in BlocksSchedules or there's no ScheduleData available for
		ABataevUnsubmitted Not Done Reply Inline Actions Rewrite it this way: SmallPtrSet<Instruction , 4> BundleInst; Bundle = Bundle->FirstInBundle; LastInst = Bundle->Inst; while (Bundle) { BundleInst.insert(Bundle->Inst); Bundle = Bundle->NextInBundle; } ABataev:* Rewrite it this way: ``` SmallPtrSet<Instruction *, 4> BundleInst; Bundle = Bundle…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions Done. Thanks. dtemirbulatov: Done. Thanks.
// VL.back(). This can be the case if buildTree_rec aborts for various		// VL.back(). This can be the case if buildTree_rec aborts for various
// reasons (e.g., the maximum recursion depth is reached, the maximum region		// reasons (e.g., the maximum recursion depth is reached, the maximum region
// size is reached, etc.). ScheduleData is initialized in the scheduling		// size is reached, etc.). ScheduleData is initialized in the scheduling
		ABataevUnsubmitted Not Done Reply Inline Actions Why do you need to scan all the instructions in the basic block starting from `First`? Why you can't use only scheduled instructions? ABataev: Why do you need to scan all the instructions in the basic block starting from `First`? Why you…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions What do you mean? please elaborate. If you mean ScheduleData here then it is also not always gives up correct sequence of scheduled instructions in a block. Like, bundle member with NextInBundle equals to null is not guaranty to be the last instruction of this bundle among other instructions. If we start with First then it is highly likely that we could iterate across all bundle members and exit instead of iterating to the end of the basic block. Also, I am thinking, to avoid this overhead we could note the last scheduled instruction in BB during scheduling in scheduleBlock() and keep this information in ScheduleData structure. dtemirbulatov: What do you mean? please elaborate. If you mean ScheduleData here then it is also not always…
// "dry-run".		// "dry-run".
		ABataevUnsubmitted Not Done Reply Inline Actions Move `++Iter` to the header of `for` loop ABataev: Move `++Iter` to the header of `for` loop
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions Done. dtemirbulatov: Done.
//		//
// If this happens, we can still find the last instruction by brute force. We		// If this happens, we can still find the last instruction by brute force. We
// iterate forwards from Front (inclusive) until we either see all		// iterate forwards from Front (inclusive) until we either see all
// instructions in the bundle or reach the end of the block. If Front is the		// instructions in the bundle or reach the end of the block. If Front is the
// last instruction in program order, LastInst will be set to Front, and we		// last instruction in program order, LastInst will be set to Front, and we
// will visit all the remaining instructions in the block.		// will visit all the remaining instructions in the block.
//		//
// One of the reasons we exit early from buildTree_rec is to place an upper		// One of the reasons we exit early from buildTree_rec is to place an upper
// bound on compile-time. Thus, taking an additional compile-time hit here is		// bound on compile-time. Thus, taking an additional compile-time hit here is
// not ideal. However, this should be exceedingly rare since it requires that		// not ideal. However, this should be exceedingly rare since it requires that
// we both exit early from buildTree_rec and that the bundle be out-of-order		// we both exit early from buildTree_rec and that the bundle be out-of-order
// (causing us to iterate all the way to the end of the block).		// (causing us to iterate all the way to the end of the block).
if (!LastInst) {		if (!LastInst) {
SmallPtrSet<Value *, 16> Bundle(VL.begin(), VL.end());		SmallPtrSet<Value *, 16> Bundle(VL.begin(), VL.end());
for (auto &I : make_range(BasicBlock::iterator(Front), BB->end())) {		for (auto &I : make_range(BasicBlock::iterator(Front), BB->end())) {
if (Bundle.erase(&I))		if (Bundle.erase(&I) && sameOpcodeOrAlt(Opcode, AltOpcode, I.getOpcode()))
LastInst = &I;		LastInst = &I;
if (Bundle.empty())		if (Bundle.empty())
break;		break;
}		}
}		}

// Set the insertion point after the last instruction in the bundle. Set the		// Set the insertion point after the last instruction in the bundle. Set the
// debug location to Front.		// debug location to Front.
Show All 34 Lines	Value BoUpSLP::alreadyVectorized(ArrayRef<Value > VL, Value *OpValue) const {
if (const TreeEntry *En = getTreeEntry(OpValue)) {		if (const TreeEntry *En = getTreeEntry(OpValue)) {
if (En->isSame(VL) && En->VectorizedValue)		if (En->isSame(VL) && En->VectorizedValue)
return En->VectorizedValue;		return En->VectorizedValue;
}		}
return nullptr;		return nullptr;
}		}

Value BoUpSLP::vectorizeTree(ArrayRef<Value > VL) {		Value BoUpSLP::vectorizeTree(ArrayRef<Value > VL) {
if (TreeEntry *E = getTreeEntry(VL[0]))		InstructionsState S = getSameOpcode(VL);
		if (S.Opcode) {
		if (TreeEntry *E = getTreeEntry(S.OpValue)) {
if (E->isSame(VL))		if (E->isSame(VL))
return vectorizeTree(E);		return vectorizeTree(E);
		}
		}

Type *ScalarTy = VL[0]->getType();		Type *ScalarTy = S.OpValue->getType();
if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))		if (StoreInst *SI = dyn_cast<StoreInst>(S.OpValue))
ScalarTy = SI->getValueOperand()->getType();		ScalarTy = SI->getValueOperand()->getType();
VectorType *VecTy = VectorType::get(ScalarTy, VL.size());		VectorType *VecTy = VectorType::get(ScalarTy, VL.size());

return Gather(VL, VecTy);		return Gather(VL, VecTy);
}		}

Value BoUpSLP::vectorizeTree(TreeEntry E) {		Value BoUpSLP::vectorizeTree(TreeEntry E) {
IRBuilder<>::InsertPointGuard Guard(Builder);		IRBuilder<>::InsertPointGuard Guard(Builder);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");		DEBUG(dbgs() << "SLP: Diamond merged for " << *E->State.OpValue << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Instruction *VL0 = cast<Instruction>(E->Scalars[0]);		Instruction *VL0 = cast<Instruction>(E->State.OpValue);
		ABataevUnsubmitted Not Done Reply Inline Actions `auto VL0` ABataev:* `auto *VL0`
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions done. dtemirbulatov: done.
Type *ScalarTy = VL0->getType();		Type *ScalarTy = VL0->getType();
if (StoreInst *SI = dyn_cast<StoreInst>(VL0))		if (StoreInst *SI = dyn_cast<StoreInst>(VL0))
ScalarTy = SI->getValueOperand()->getType();		ScalarTy = SI->getValueOperand()->getType();
VectorType *VecTy = VectorType::get(ScalarTy, E->Scalars.size());		VectorType *VecTy = VectorType::get(ScalarTy, E->Scalars.size());

if (E->NeedToGather) {		if (E->NeedToGather) {
setInsertPointAfterBundle(E->Scalars);		setInsertPointAfterBundle(E->Scalars, VL0);
auto *V = Gather(E->Scalars, VecTy);		auto *V = Gather(E->Scalars, VecTy);
E->VectorizedValue = V;		E->VectorizedValue = V;
return V;		return V;
}		}

unsigned Opcode = getSameOpcode(E->Scalars);		unsigned ShuffleOrOp = E->State.IsAltShuffle ?
		(unsigned) Instruction::ShuffleVector : E->State.Opcode;
		RKSimonUnsubmitted Not Done Reply Inline Actions Please don't embed this in the switch() RKSimon: Please don't embed this in the switch()
switch (Opcode) {		switch (ShuffleOrOp) {
case Instruction::PHI: {		case Instruction::PHI: {
PHINode *PH = dyn_cast<PHINode>(VL0);		PHINode *PH = dyn_cast<PHINode>(VL0);
Builder.SetInsertPoint(PH->getParent()->getFirstNonPHI());		Builder.SetInsertPoint(PH->getParent()->getFirstNonPHI());
Builder.SetCurrentDebugLocation(PH->getDebugLoc());		Builder.SetCurrentDebugLocation(PH->getDebugLoc());
PHINode *NewPhi = Builder.CreatePHI(VecTy, PH->getNumIncomingValues());		PHINode *NewPhi = Builder.CreatePHI(VecTy, PH->getNumIncomingValues());
E->VectorizedValue = NewPhi;		E->VectorizedValue = NewPhi;

// PHINodes may have multiple entries from the same block. We want to		// PHINodes may have multiple entries from the same block. We want to
Show All 25 Lines	switch (ShuffleOrOp) {
}		}

case Instruction::ExtractElement: {		case Instruction::ExtractElement: {
if (canReuseExtract(E->Scalars, VL0)) {		if (canReuseExtract(E->Scalars, VL0)) {
Value *V = VL0->getOperand(0);		Value *V = VL0->getOperand(0);
E->VectorizedValue = V;		E->VectorizedValue = V;
return V;		return V;
}		}
setInsertPointAfterBundle(E->Scalars);		setInsertPointAfterBundle(E->Scalars, VL0);
auto *V = Gather(E->Scalars, VecTy);		auto *V = Gather(E->Scalars, VecTy);
E->VectorizedValue = V;		E->VectorizedValue = V;
return V;		return V;
}		}
case Instruction::ExtractValue: {		case Instruction::ExtractValue: {
if (canReuseExtract(E->Scalars, VL0)) {		if (canReuseExtract(E->Scalars, VL0)) {
LoadInst *LI = cast<LoadInst>(VL0->getOperand(0));		LoadInst *LI = cast<LoadInst>(VL0->getOperand(0));
Builder.SetInsertPoint(LI);		Builder.SetInsertPoint(LI);
PointerType *PtrTy = PointerType::get(VecTy, LI->getPointerAddressSpace());		PointerType *PtrTy = PointerType::get(VecTy, LI->getPointerAddressSpace());
Value *Ptr = Builder.CreateBitCast(LI->getOperand(0), PtrTy);		Value *Ptr = Builder.CreateBitCast(LI->getOperand(0), PtrTy);
LoadInst *V = Builder.CreateAlignedLoad(Ptr, LI->getAlignment());		LoadInst *V = Builder.CreateAlignedLoad(Ptr, LI->getAlignment());
E->VectorizedValue = V;		E->VectorizedValue = V;
return propagateMetadata(V, E->Scalars);		return propagateMetadata(V, E->Scalars);
}		}
setInsertPointAfterBundle(E->Scalars);		setInsertPointAfterBundle(E->Scalars, VL0);
auto *V = Gather(E->Scalars, VecTy);		auto *V = Gather(E->Scalars, VecTy);
E->VectorizedValue = V;		E->VectorizedValue = V;
return V;		return V;
}		}
case Instruction::ZExt:		case Instruction::ZExt:
case Instruction::SExt:		case Instruction::SExt:
case Instruction::FPToUI:		case Instruction::FPToUI:
case Instruction::FPToSI:		case Instruction::FPToSI:
case Instruction::FPExt:		case Instruction::FPExt:
case Instruction::PtrToInt:		case Instruction::PtrToInt:
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
ValueList INVL;		ValueList INVL;
for (Value *V : E->Scalars)		for (Value *V : E->Scalars)
INVL.push_back(cast<Instruction>(V)->getOperand(0));		INVL.push_back(cast<Instruction>(V)->getOperand(0));

setInsertPointAfterBundle(E->Scalars);		setInsertPointAfterBundle(E->Scalars, VL0);

Value *InVec = vectorizeTree(INVL);		Value *InVec = vectorizeTree(INVL);

if (Value *V = alreadyVectorized(E->Scalars, VL0))		if (Value *V = alreadyVectorized(E->Scalars, VL0))
return V;		return V;

CastInst *CI = dyn_cast<CastInst>(VL0);		CastInst *CI = dyn_cast<CastInst>(VL0);
Value *V = Builder.CreateCast(CI->getOpcode(), InVec, VecTy);		Value *V = Builder.CreateCast(CI->getOpcode(), InVec, VecTy);
E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::FCmp:		case Instruction::FCmp:
case Instruction::ICmp: {		case Instruction::ICmp: {
ValueList LHSV, RHSV;		ValueList LHSV, RHSV;
for (Value *V : E->Scalars) {		for (Value *V : E->Scalars) {
LHSV.push_back(cast<Instruction>(V)->getOperand(0));		LHSV.push_back(cast<Instruction>(V)->getOperand(0));
RHSV.push_back(cast<Instruction>(V)->getOperand(1));		RHSV.push_back(cast<Instruction>(V)->getOperand(1));
}		}

setInsertPointAfterBundle(E->Scalars);		setInsertPointAfterBundle(E->Scalars, VL0);

Value *L = vectorizeTree(LHSV);		Value *L = vectorizeTree(LHSV);
Value *R = vectorizeTree(RHSV);		Value *R = vectorizeTree(RHSV);

if (Value *V = alreadyVectorized(E->Scalars, VL0))		if (Value *V = alreadyVectorized(E->Scalars, VL0))
return V;		return V;

CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();		CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();
Value *V;		Value *V;
if (Opcode == Instruction::FCmp)		if (E->State.Opcode == Instruction::FCmp)
V = Builder.CreateFCmp(P0, L, R);		V = Builder.CreateFCmp(P0, L, R);
else		else
V = Builder.CreateICmp(P0, L, R);		V = Builder.CreateICmp(P0, L, R);

E->VectorizedValue = V;		E->VectorizedValue = V;
propagateIRFlags(E->VectorizedValue, E->Scalars);		propagateIRFlags(E->VectorizedValue, E->Scalars, VL0);
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::Select: {		case Instruction::Select: {
ValueList TrueVec, FalseVec, CondVec;		ValueList TrueVec, FalseVec, CondVec;
for (Value *V : E->Scalars) {		for (Value *V : E->Scalars) {
CondVec.push_back(cast<Instruction>(V)->getOperand(0));		CondVec.push_back(cast<Instruction>(V)->getOperand(0));
TrueVec.push_back(cast<Instruction>(V)->getOperand(1));		TrueVec.push_back(cast<Instruction>(V)->getOperand(1));
FalseVec.push_back(cast<Instruction>(V)->getOperand(2));		FalseVec.push_back(cast<Instruction>(V)->getOperand(2));
}		}

setInsertPointAfterBundle(E->Scalars);		setInsertPointAfterBundle(E->Scalars, VL0);

Value *Cond = vectorizeTree(CondVec);		Value *Cond = vectorizeTree(CondVec);
Value *True = vectorizeTree(TrueVec);		Value *True = vectorizeTree(TrueVec);
Value *False = vectorizeTree(FalseVec);		Value *False = vectorizeTree(FalseVec);

if (Value *V = alreadyVectorized(E->Scalars, VL0))		if (Value *V = alreadyVectorized(E->Scalars, VL0))
return V;		return V;

Show All 17 Lines	switch (ShuffleOrOp) {
case Instruction::Shl:		case Instruction::Shl:
case Instruction::LShr:		case Instruction::LShr:
case Instruction::AShr:		case Instruction::AShr:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor: {		case Instruction::Xor: {
ValueList LHSVL, RHSVL;		ValueList LHSVL, RHSVL;
if (isa<BinaryOperator>(VL0) && VL0->isCommutative())		if (isa<BinaryOperator>(VL0) && VL0->isCommutative())
reorderInputsAccordingToOpcode(E->Scalars, LHSVL, RHSVL);		reorderInputsAccordingToOpcode(E->State.Opcode, E->Scalars, LHSVL,
		RHSVL);
else		else
for (Value *V : E->Scalars) {		for (Value *V : E->Scalars) {
LHSVL.push_back(cast<Instruction>(V)->getOperand(0));		auto *I = cast<Instruction>(V);
RHSVL.push_back(cast<Instruction>(V)->getOperand(1));		if (I->getOpcode() == E->State.Opcode) {
		LHSVL.push_back(I->getOperand(0));
		RHSVL.push_back(I->getOperand(1));
		} else {
		LHSVL.push_back(V);
		RHSVL.push_back(
		getDefaultConstantForOpcode(E->State.Opcode, I->getType()));
		}
}		}

setInsertPointAfterBundle(E->Scalars);		setInsertPointAfterBundle(E->Scalars, VL0);

Value *LHS = vectorizeTree(LHSVL);		Value *LHS = vectorizeTree(LHSVL);
Value *RHS = vectorizeTree(RHSVL);		Value *RHS = vectorizeTree(RHSVL);

if (Value *V = alreadyVectorized(E->Scalars, VL0))		if (Value *V = alreadyVectorized(E->Scalars, VL0))
return V;		return V;

BinaryOperator *BinOp = cast<BinaryOperator>(VL0);		Value *V = Builder.CreateBinOp(
Value *V = Builder.CreateBinOp(BinOp->getOpcode(), LHS, RHS);		static_cast<Instruction::BinaryOps>(E->State.Opcode), LHS, RHS);
E->VectorizedValue = V;		E->VectorizedValue = V;
propagateIRFlags(E->VectorizedValue, E->Scalars);		propagateIRFlags(E->VectorizedValue, E->Scalars, VL0);
++NumVectorInstructions;		++NumVectorInstructions;

if (Instruction *I = dyn_cast<Instruction>(V))		if (Instruction *I = dyn_cast<Instruction>(V))
return propagateMetadata(I, E->Scalars);		return propagateMetadata(I, E->Scalars);

return V;		return V;
}		}
case Instruction::Load: {		case Instruction::Load: {
// Loads are inserted at the head of the tree because we don't want to		// Loads are inserted at the head of the tree because we don't want to
// sink them all the way down past store instructions.		// sink them all the way down past store instructions.
setInsertPointAfterBundle(E->Scalars);		setInsertPointAfterBundle(E->Scalars, VL0);
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions We used E->VL0 here before, which is not correct because it could point to a wrong element. dtemirbulatov: We used E->VL0 here before, which is not correct because it could point to a wrong element.
		ABataevUnsubmitted Not Done Reply Inline Actions Why could it point to a wrong element? As I understand, it should always point to Scalars[0] for LoadInst. ABataev: Why could it point to a wrong element? As I understand, it should always point to Scalars[0]…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions No, it is in order as it was discovered. I will add a testcase for the issue. dtemirbulatov: No, it is in order as it was discovered. I will add a testcase for the issue.
		ABataevUnsubmitted Not Done Reply Inline Actions I still don't understand what's the problem here. Why we can't use `VL0` in both cases? ABataev: I still don't understand what's the problem here. Why we can't use `VL0 ` in both cases?
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions For consecutive loads [load0, load1, load2, load3] in HorizontalReduction::matchAssociativeReduction function ReducedVals vector was formed as [load1, load0, load2, load3] and later TreeEntry was added with State.OpValue pointing to load1 in BoUpSLP::buildTree_rec, while TreeEntry->Scalars contained correct sequence of loads [load0, load1, load2, load3]. dtemirbulatov: For consecutive loads [load0, load1, load2, load3] in HorizontalReduction…
		ABataevUnsubmitted Not Done Reply Inline Actions Could you investigate why does this happen? ABataev: Could you investigate why does this happen?
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions ok dtemirbulatov: ok

LoadInst *LI = cast<LoadInst>(VL0);		LoadInst *LI = cast<LoadInst>(VL0);
Type *ScalarLoadTy = LI->getType();		Type *ScalarLoadTy = LI->getType();
unsigned AS = LI->getPointerAddressSpace();		unsigned AS = LI->getPointerAddressSpace();

Value *VecPtr = Builder.CreateBitCast(LI->getPointerOperand(),		Value *VecPtr = Builder.CreateBitCast(LI->getPointerOperand(),
VecTy->getPointerTo(AS));		VecTy->getPointerTo(AS));

Show All 18 Lines	case Instruction::Store: {
StoreInst *SI = cast<StoreInst>(VL0);		StoreInst *SI = cast<StoreInst>(VL0);
unsigned Alignment = SI->getAlignment();		unsigned Alignment = SI->getAlignment();
unsigned AS = SI->getPointerAddressSpace();		unsigned AS = SI->getPointerAddressSpace();

ValueList ValueOp;		ValueList ValueOp;
for (Value *V : E->Scalars)		for (Value *V : E->Scalars)
ValueOp.push_back(cast<StoreInst>(V)->getValueOperand());		ValueOp.push_back(cast<StoreInst>(V)->getValueOperand());

setInsertPointAfterBundle(E->Scalars);		setInsertPointAfterBundle(E->Scalars, VL0);

Value *VecValue = vectorizeTree(ValueOp);		Value *VecValue = vectorizeTree(ValueOp);
Value *VecPtr = Builder.CreateBitCast(SI->getPointerOperand(),		Value *VecPtr = Builder.CreateBitCast(SI->getPointerOperand(),
VecTy->getPointerTo(AS));		VecTy->getPointerTo(AS));
StoreInst *S = Builder.CreateStore(VecValue, VecPtr);		StoreInst *S = Builder.CreateStore(VecValue, VecPtr);

// The pointer operand uses an in-tree scalar so we add the new BitCast to		// The pointer operand uses an in-tree scalar so we add the new BitCast to
// ExternalUses list to make sure that an extract will be generated in the		// ExternalUses list to make sure that an extract will be generated in the
// future.		// future.
Value *PO = SI->getPointerOperand();		Value *PO = SI->getPointerOperand();
if (getTreeEntry(PO))		if (getTreeEntry(PO))
ExternalUses.push_back(ExternalUser(PO, cast<User>(VecPtr), 0));		ExternalUses.push_back(ExternalUser(PO, cast<User>(VecPtr), 0));

if (!Alignment) {		if (!Alignment) {
Alignment = DL->getABITypeAlignment(SI->getValueOperand()->getType());		Alignment = DL->getABITypeAlignment(SI->getValueOperand()->getType());
}		}
S->setAlignment(Alignment);		S->setAlignment(Alignment);
E->VectorizedValue = S;		E->VectorizedValue = S;
++NumVectorInstructions;		++NumVectorInstructions;
return propagateMetadata(S, E->Scalars);		return propagateMetadata(S, E->Scalars);
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
setInsertPointAfterBundle(E->Scalars);		setInsertPointAfterBundle(E->Scalars, VL0);

ValueList Op0VL;		ValueList Op0VL;
for (Value *V : E->Scalars)		for (Value *V : E->Scalars)
Op0VL.push_back(cast<GetElementPtrInst>(V)->getOperand(0));		Op0VL.push_back(cast<GetElementPtrInst>(V)->getOperand(0));

Value *Op0 = vectorizeTree(Op0VL);		Value *Op0 = vectorizeTree(Op0VL);

std::vector<Value *> OpVecs;		std::vector<Value *> OpVecs;
Show All 14 Lines	case Instruction::GetElementPtr: {

if (Instruction *I = dyn_cast<Instruction>(V))		if (Instruction *I = dyn_cast<Instruction>(V))
return propagateMetadata(I, E->Scalars);		return propagateMetadata(I, E->Scalars);

return V;		return V;
}		}
case Instruction::Call: {		case Instruction::Call: {
CallInst *CI = cast<CallInst>(VL0);		CallInst *CI = cast<CallInst>(VL0);
setInsertPointAfterBundle(VL0);		setInsertPointAfterBundle(E->Scalars, VL0);
Function *FI;		Function *FI;
Intrinsic::ID IID = Intrinsic::not_intrinsic;		Intrinsic::ID IID = Intrinsic::not_intrinsic;
Value *ScalarArg = nullptr;		Value *ScalarArg = nullptr;
if (CI && (FI = CI->getCalledFunction())) {		if (CI && (FI = CI->getCalledFunction())) {
IID = FI->getIntrinsicID();		IID = FI->getIntrinsicID();
}		}
std::vector<Value *> OpVecs;		std::vector<Value *> OpVecs;
for (int j = 0, e = CI->getNumArgOperands(); j < e; ++j) {		for (int j = 0, e = CI->getNumArgOperands(); j < e; ++j) {
ValueList OpVL;		ValueList OpVL;
// ctlz,cttz and powi are special intrinsics whose second argument is		// ctlz,cttz and powi are special intrinsics whose second argument is
// a scalar. This argument should not be vectorized.		// a scalar. This argument should not be vectorized.
if (hasVectorInstrinsicScalarOpd(IID, 1) && j == 1) {		if (hasVectorInstrinsicScalarOpd(IID, 1) && j == 1) {
CallInst *CEI = cast<CallInst>(E->Scalars[0]);		CallInst *CEI = cast<CallInst>(VL0);
ScalarArg = CEI->getArgOperand(j);		ScalarArg = CEI->getArgOperand(j);
OpVecs.push_back(CEI->getArgOperand(j));		OpVecs.push_back(CEI->getArgOperand(j));
continue;		continue;
}		}
for (Value *V : E->Scalars) {		for (Value *V : E->Scalars) {
CallInst *CEI = cast<CallInst>(V);		CallInst *CEI = cast<CallInst>(V);
OpVL.push_back(CEI->getArgOperand(j));		OpVL.push_back(CEI->getArgOperand(j));
}		}
Show All 13 Lines	case Instruction::Call: {

// The scalar argument uses an in-tree scalar so we add the new vectorized		// The scalar argument uses an in-tree scalar so we add the new vectorized
// call to ExternalUses list to make sure that an extract will be		// call to ExternalUses list to make sure that an extract will be
// generated in the future.		// generated in the future.
if (ScalarArg && getTreeEntry(ScalarArg))		if (ScalarArg && getTreeEntry(ScalarArg))
ExternalUses.push_back(ExternalUser(ScalarArg, cast<User>(V), 0));		ExternalUses.push_back(ExternalUser(ScalarArg, cast<User>(V), 0));

E->VectorizedValue = V;		E->VectorizedValue = V;
propagateIRFlags(E->VectorizedValue, E->Scalars);		propagateIRFlags(E->VectorizedValue, E->Scalars, VL0);
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::ShuffleVector: {		case Instruction::ShuffleVector: {
ValueList LHSVL, RHSVL;		ValueList LHSVL, RHSVL;
assert(isa<BinaryOperator>(VL0) && "Invalid Shuffle Vector Operand");		assert(Instruction::isBinaryOp(E->State.Opcode) &&
reorderAltShuffleOperands(E->Scalars, LHSVL, RHSVL);		"Invalid Shuffle Vector Operand");
setInsertPointAfterBundle(E->Scalars);		reorderAltShuffleOperands(E->State.Opcode, E->Scalars, LHSVL, RHSVL);
		setInsertPointAfterBundle(E->Scalars, VL0);

Value *LHS = vectorizeTree(LHSVL);		Value *LHS = vectorizeTree(LHSVL);
Value *RHS = vectorizeTree(RHSVL);		Value *RHS = vectorizeTree(RHSVL);

if (Value *V = alreadyVectorized(E->Scalars, VL0))		if (Value *V = alreadyVectorized(E->Scalars, VL0))
return V;		return V;

// Create a vector of LHS op1 RHS		// Create a vector of LHS op1 RHS
BinaryOperator *BinOp0 = cast<BinaryOperator>(VL0);		Value *V0 = Builder.CreateBinOp(
Value *V0 = Builder.CreateBinOp(BinOp0->getOpcode(), LHS, RHS);		static_cast<Instruction::BinaryOps>(E->State.Opcode), LHS, RHS);

		unsigned AltOpcode = getAltOpcode(E->State.Opcode);
// Create a vector of LHS op2 RHS		// Create a vector of LHS op2 RHS
Instruction *VL1 = cast<Instruction>(E->Scalars[1]);		Value *V1 = Builder.CreateBinOp(
BinaryOperator *BinOp1 = cast<BinaryOperator>(VL1);		static_cast<Instruction::BinaryOps>(AltOpcode), LHS, RHS);
Value *V1 = Builder.CreateBinOp(BinOp1->getOpcode(), LHS, RHS);

// Create shuffle to take alternate operations from the vector.		// Create shuffle to take alternate operations from the vector.
// Also, gather up odd and even scalar ops to propagate IR flags to		// Also, gather up odd and even scalar ops to propagate IR flags to
// each vector operation.		// each vector operation.
ValueList OddScalars, EvenScalars;		ValueList OddScalars, EvenScalars;
unsigned e = E->Scalars.size();		unsigned e = E->Scalars.size();
SmallVector<Constant *, 8> Mask(e);		SmallVector<Constant *, 8> Mask(e);
for (unsigned i = 0; i < e; ++i) {		for (unsigned i = 0; i < e; ++i) {
if (isOdd(i)) {		if (isOdd(i)) {
Mask[i] = Builder.getInt32(e + i);		Mask[i] = Builder.getInt32(e + i);
OddScalars.push_back(E->Scalars[i]);		OddScalars.push_back(E->Scalars[i]);
} else {		} else {
Mask[i] = Builder.getInt32(i);		Mask[i] = Builder.getInt32(i);
EvenScalars.push_back(E->Scalars[i]);		EvenScalars.push_back(E->Scalars[i]);
}		}
}		}

Value *ShuffleMask = ConstantVector::get(Mask);		Value *ShuffleMask = ConstantVector::get(Mask);
propagateIRFlags(V0, EvenScalars);		InstructionsState S = getSameOpcode(EvenScalars);
propagateIRFlags(V1, OddScalars);		assert(!S.IsAltShuffle && "Unexpected alternate opcode");
		RKSimonUnsubmitted Not Done Reply Inline Actions Always add an assert message. RKSimon: Always add an assert message.
		propagateIRFlags(V0, EvenScalars, S.OpValue);

		S = getSameOpcode(OddScalars);
		assert(!S.IsAltShuffle && "Unexpected alternate opcode");
		RKSimonUnsubmitted Not Done Reply Inline Actions Always add an assert message. RKSimon: Always add an assert message.
		propagateIRFlags(V1, OddScalars, S.OpValue);

Value *V = Builder.CreateShuffleVector(V0, V1, ShuffleMask);		Value *V = Builder.CreateShuffleVector(V0, V1, ShuffleMask);
E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
if (Instruction *I = dyn_cast<Instruction>(V))		if (Instruction *I = dyn_cast<Instruction>(V))
return propagateMetadata(I, E->Scalars);		return propagateMetadata(I, E->Scalars);

return V;		return V;
Show All 18 Lines	BoUpSLP::vectorizeTree(ExtraValueToDebugLocsMap &ExternallyUsedValues) {
}		}

Builder.SetInsertPoint(&F->getEntryBlock().front());		Builder.SetInsertPoint(&F->getEntryBlock().front());
auto *VectorRoot = vectorizeTree(&VectorizableTree[0]);		auto *VectorRoot = vectorizeTree(&VectorizableTree[0]);

// If the vectorized tree can be rewritten in a smaller type, we truncate the		// If the vectorized tree can be rewritten in a smaller type, we truncate the
// vectorized root. InstCombine will then rewrite the entire expression. We		// vectorized root. InstCombine will then rewrite the entire expression. We
// sign extend the extracted values below.		// sign extend the extracted values below.
auto *ScalarRoot = VectorizableTree[0].Scalars[0];		auto *ScalarRoot = VectorizableTree[0].State.OpValue;
if (MinBWs.count(ScalarRoot)) {		if (MinBWs.count(ScalarRoot)) {
if (auto *I = dyn_cast<Instruction>(VectorRoot))		if (auto *I = dyn_cast<Instruction>(VectorRoot))
Builder.SetInsertPoint(&*++BasicBlock::iterator(I));		Builder.SetInsertPoint(&*++BasicBlock::iterator(I));
auto BundleWidth = VectorizableTree[0].Scalars.size();		auto BundleWidth = VectorizableTree[0].Scalars.size();
auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);		auto *MinTy = IntegerType::get(F->getContext(), MinBWs[ScalarRoot].first);
auto *VecTy = VectorType::get(MinTy, BundleWidth);		auto *VecTy = VectorType::get(MinTy, BundleWidth);
auto *Trunc = Builder.CreateTrunc(VectorRoot, VecTy);		auto *Trunc = Builder.CreateTrunc(VectorRoot, VecTy);
VectorizableTree[0].VectorizedValue = Trunc;		VectorizableTree[0].VectorizedValue = Trunc;
Show All 17 Lines	for (const auto &ExternalUse : ExternalUses) {
llvm::User *User = ExternalUse.User;		llvm::User *User = ExternalUse.User;

// Skip users that we already RAUW. This happens when one instruction		// Skip users that we already RAUW. This happens when one instruction
// has multiple uses of the same value.		// has multiple uses of the same value.
if (User && !is_contained(Scalar->users(), User))		if (User && !is_contained(Scalar->users(), User))
continue;		continue;
TreeEntry *E = getTreeEntry(Scalar);		TreeEntry *E = getTreeEntry(Scalar);
assert(E && "Invalid scalar");		assert(E && "Invalid scalar");

assert(!E->NeedToGather && "Extracting from a gather list");		assert(!E->NeedToGather && "Extracting from a gather list");

Value *Vec = E->VectorizedValue;		Value *Vec = E->VectorizedValue;
assert(Vec && "Can't find vectorizable value");		assert(Vec && "Can't find vectorizable value");

Value *Lane = Builder.getInt32(ExternalUse.Lane);		Value *Lane = Builder.getInt32(ExternalUse.Lane);
// If User == nullptr, the Scalar is used as extra arg. Generate		// If User == nullptr, the Scalar is used as extra arg. Generate
// ExtractElement instruction and update the record for this scalar in		// ExtractElement instruction and update the record for this scalar in
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	for (const auto &ExternalUse : ExternalUses) {

DEBUG(dbgs() << "SLP: Replaced:" << *User << ".\n");		DEBUG(dbgs() << "SLP: Replaced:" << *User << ".\n");
}		}

// For each vectorized value:		// For each vectorized value:
for (TreeEntry &EIdx : VectorizableTree) {		for (TreeEntry &EIdx : VectorizableTree) {
TreeEntry *Entry = &EIdx;		TreeEntry *Entry = &EIdx;

// For each lane:
for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
Value *Scalar = Entry->Scalars[Lane];
// No need to handle users of gathered values.		// No need to handle users of gathered values.
if (Entry->NeedToGather)		if (Entry->NeedToGather)
		RKSimonUnsubmitted Not Done Reply Inline Actions Can this (and the assert below) be pulled out of the inner loop as an NFC commit now? It looks loop invariant. RKSimon: Can this (and the assert below) be pulled out of the inner loop as an NFC commit now? It looks…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions hmm, Entry is updating on every cycle above. dtemirbulatov: hmm, Entry is updating on every cycle above.
continue;		continue;

assert(Entry->VectorizedValue && "Can't find vectorizable value");		assert(Entry->VectorizedValue && "Can't find vectorizable value");

		// For each lane:
		const unsigned Opcode = Entry->State.Opcode;
		const unsigned AltOpcode = getAltOpcode(Opcode);
		for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
		RKSimonUnsubmitted Not Done Reply Inline Actions for (auto Scalar : Entry->Scalars) Possibly as a NFC pre-commit. RKSimon:* for (auto *Scalar : Entry->Scalars) Possibly as a NFC pre-commit.
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions "Lane" variable is required in the code below. dtemirbulatov: "Lane" variable is required in the code below.
		Value *Scalar = Entry->Scalars[Lane];

		if (!sameOpcodeOrAlt(Opcode, AltOpcode,
		cast<Instruction>(Scalar)->getOpcode()))
		continue;

Type *Ty = Scalar->getType();		Type *Ty = Scalar->getType();
if (!Ty->isVoidTy()) {		if (!Ty->isVoidTy()) {
#ifndef NDEBUG		#ifndef NDEBUG
for (User *U : Scalar->users()) {		for (User *U : Scalar->users()) {
DEBUG(dbgs() << "SLP: \tvalidating user:" << *U << ".\n");		DEBUG(dbgs() << "SLP: \tvalidating user:" << *U << ".\n");

// It is legal to replace users in the ignorelist by undef.		// It is legal to replace users in the ignorelist by undef.
		ABataevUnsubmitted Not Done Reply Inline Actions Name of the variable must start from capital letter. ABataev: Name of the variable must start from capital letter.
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions Done. dtemirbulatov: Done.
assert((getTreeEntry(U) \|\| is_contained(UserIgnoreList, U)) &&		assert((getTreeEntry(U) \|\| is_contained(UserIgnoreList, U)) &&
"Replacing out-of-tree value with undef");		"Replacing out-of-tree value with undef");
}		}
		ABataevUnsubmitted Not Done Reply Inline Actions The code is not formatted Seems to me you missed `break;` ABataev: 1. The code is not formatted 2. Seems to me you missed `break;`
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions Correct, Thanks. dtemirbulatov: Correct, Thanks.
#endif		#endif
Value *Undef = UndefValue::get(Ty);		Value *Undef = UndefValue::get(Ty);
Scalar->replaceAllUsesWith(Undef);		Scalar->replaceAllUsesWith(Undef);
}		}
DEBUG(dbgs() << "SLP: \tErasing scalar:" << *Scalar << ".\n");		DEBUG(dbgs() << "SLP: \tErasing scalar:" << *Scalar << ".\n");
eraseInstruction(cast<Instruction>(Scalar));		eraseInstruction(cast<Instruction>(Scalar));
		ABataevUnsubmitted Not Done Reply Inline Actions Better to do it this way: if (llvm::any_of(Scalar->users(), [this, Entry, Scalar](User U){return !getTreeEntry(U) && getTreeEntry(U, Scalar) == Entry;})) continue; ABataev:* Better to do it this way: ``` if (llvm::any_of(Scalar->users(), [this, Entry, Scalar](User…
}		}
}		}

Builder.ClearInsertionPoint();		Builder.ClearInsertionPoint();

return VectorizableTree[0].VectorizedValue;		return VectorizableTree[0].VectorizedValue;
}		}

▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
}		}

// Groups the instructions to a bundle (which is then a single scheduling entity)		// Groups the instructions to a bundle (which is then a single scheduling entity)
// and schedules instructions until the bundle gets ready.		// and schedules instructions until the bundle gets ready.
bool BoUpSLP::BlockScheduling::tryScheduleBundle(ArrayRef<Value *> VL,		bool BoUpSLP::BlockScheduling::tryScheduleBundle(ArrayRef<Value *> VL,
BoUpSLP SLP, Value OpValue) {		BoUpSLP SLP, Value OpValue) {
if (isa<PHINode>(OpValue))		if (isa<PHINode>(OpValue))
return true;		return true;

		ABataevUnsubmitted Not Done Reply Inline Actions `DominatorTree DT` is not required. ABataev:* `DominatorTree *DT` is not required.
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions done. dtemirbulatov: done.
// Initialize the instruction bundle.		// Initialize the instruction bundle.
Instruction *OldScheduleEnd = ScheduleEnd;		Instruction *OldScheduleEnd = ScheduleEnd;
ScheduleData *PrevInBundle = nullptr;		ScheduleData *PrevInBundle = nullptr;
ScheduleData *Bundle = nullptr;		ScheduleData *Bundle = nullptr;
bool ReSchedule = false;		bool ReSchedule = false;
DEBUG(dbgs() << "SLP: bundle: " << *OpValue << "\n");		DEBUG(dbgs() << "SLP: bundle: " << *OpValue << "\n");

// Make sure that the scheduling region contains all		// Make sure that the scheduling region contains all
// instructions of the bundle.		// instructions of the bundle.
		ABataevUnsubmitted Not Done Reply Inline Actions What is the `real vector operation`? And why do you need this check? You mean that it is allowed to have sequence `load, add`, but not `add ,load`? Why? ABataev: What is the `real vector operation`? And why do you need this check? You mean that it is…
		ABataevUnsubmitted Not Done Reply Inline Actions Seems to me, what you need to fix is `setInsertPointAfterBundle` function rather than addinng\|checking domination here. ABataev: Seems to me, what you need to fix is `setInsertPointAfterBundle` function rather than…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions Yes, thanks, I see the problem in setInsertPointAfterBundle function. dtemirbulatov: Yes, thanks, I see the problem in setInsertPointAfterBundle function.
for (Value *V : VL) {		for (Value *V : VL) {
		ABataevUnsubmitted Not Done Reply Inline Actions `auto I` ABataev:* `auto *I`
if (!extendSchedulingRegion(V))		if (!extendSchedulingRegion(V, OpValue))
return false;		return false;
		ABataevUnsubmitted Not Done Reply Inline Actions Seems to me, we should extend scheduling region only for instructions with the same or alternate opcode. Maybe, it will resolve all your problems. ABataev: Seems to me, we should extend scheduling region only for instructions with the same or…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions No, in that case, we would not be able to extend scheduling region with non-alternative opcodes. dtemirbulatov: No, in that case, we would not be able to extend scheduling region with non-alternative opcodes.
		ABataevUnsubmitted Not Done Reply Inline Actions Did you try that? Actually, we're not going to vectorize instructions with the different opcodes. So, maybe, we should exclude them from the scheduling region? ABataev: Did you try that? Actually, we're not going to vectorize instructions with the different…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions oh, I see what you mean now. dtemirbulatov: oh, I see what you mean now.
}		}

for (Value *V : VL) {		for (Value *V : VL) {
ScheduleData *BundleMember = getScheduleData(V);		ScheduleData *BundleMember = getScheduleData(V, isOneOf(OpValue, V));
assert(BundleMember &&		assert(BundleMember &&
"no ScheduleData for bundle member (maybe not in same basic block)");		"no ScheduleData for bundle member (maybe not in same basic block)");
		ABataevUnsubmitted Not Done Reply Inline Actions Why you're checking for `ExtractValue`s here? ABataev: Why you're checking for `ExtractValue`s here?
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions done. I don't have any limitation for the extract element operation anymore. dtemirbulatov: done. I don't have any limitation for the extract element operation anymore.
if (BundleMember->IsScheduled) {		if (BundleMember->IsScheduled) {
// A bundle member was scheduled as single instruction before and now		// A bundle member was scheduled as single instruction before and now
		ABataevUnsubmitted Not Done Reply Inline Actions `auto I1` ABataev:* `auto *I1`
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions done. dtemirbulatov: done.
// needs to be scheduled as part of the bundle. We just get rid of the		// needs to be scheduled as part of the bundle. We just get rid of the
		ABataevUnsubmitted Not Done Reply Inline Actions `I1 != nullptr` -> `I1` `I1` = vis very bad name ABataev: 1. `I1 != nullptr` -> `I1` 2. `I1` = vis very bad name
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions done. dtemirbulatov: done.
// existing schedule.		// existing schedule.
DEBUG(dbgs() << "SLP: reset schedule because " << *BundleMember		DEBUG(dbgs() << "SLP: reset schedule because " << *BundleMember
<< " was already scheduled\n");		<< " was already scheduled\n");
ReSchedule = true;		ReSchedule = true;
}		}
		ABataevUnsubmitted Not Done Reply Inline Actions I still don't understand what are checking for here. Need more description and real tests ABataev: I still don't understand what are checking for here. Need more description and real tests
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions I hit the issue without "(SameOrAlt <= (VL.size() / 2))" limitation, I can send you testcases offline. but I believe with could trigger the issue without this check even in the current implementation. dtemirbulatov: I hit the issue without "(SameOrAlt <= (VL.size() / 2))" limitation, I can send you testcases…
		ABataevUnsubmitted Not Done Reply Inline Actions The check itself is very expensive and not clear.That's why it causes a lot of questions. You should describe the problem in more details, add the test case to the patch that reveals the problem and prove, that this fix is universal. ABataev: The check itself is very expensive and not clear.That's why it causes a lot of questions. You…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions Added testcase internal-dep.ll for that, the issue could be reproduced if we remove this check and remove another "(SameOrAlt <= (VL.size() / 2))" limitation at line 1556. I believe that the issue is present even in the current implementation, but I just don't have enough code base to prove it. dtemirbulatov: Added testcase internal-dep.ll for that, the issue could be reproduced if we remove this check…
assert(BundleMember->isSchedulingEntity() &&		assert(BundleMember->isSchedulingEntity() &&
"bundle member already part of other bundle");		"bundle member already part of other bundle");
		ABataevUnsubmitted Not Done Reply Inline Actions `auto UserInst` ABataev:* `auto *UserInst`
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions done. dtemirbulatov: done.
if (PrevInBundle) {		if (PrevInBundle) {
PrevInBundle->NextInBundle = BundleMember;		PrevInBundle->NextInBundle = BundleMember;
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions I have observed some syntectic testcases where we could have cycle dependencies formed, I am sure without this check we could hit the issue somehow. dtemirbulatov: I have observed some syntectic testcases where we could have cycle dependencies formed, I am…
		ABataevUnsubmitted Not Done Reply Inline Actions I rather doubt that this is required. Actually, you're not vectorizing the instruction itself, but the alternative pseudo-operation. ABataev: I rather doubt that this is required. Actually, you're not vectorizing the instruction itself…
} else {		} else {
Bundle = BundleMember;		Bundle = BundleMember;
		ABataevUnsubmitted Not Done Reply Inline Actions `auto I1` `I1` is very bad name ABataev:* 1. `auto *I1` 2. `I1` is very bad name
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions done. dtemirbulatov: done.
}		}
BundleMember->UnscheduledDepsInBundle = 0;		BundleMember->UnscheduledDepsInBundle = 0;
Bundle->UnscheduledDepsInBundle += BundleMember->UnscheduledDeps;		Bundle->UnscheduledDepsInBundle += BundleMember->UnscheduledDeps;

// Group the instructions to a bundle.		// Group the instructions to a bundle.
BundleMember->FirstInBundle = Bundle;		BundleMember->FirstInBundle = Bundle;
PrevInBundle = BundleMember;		PrevInBundle = BundleMember;
}		}
if (ScheduleEnd != OldScheduleEnd) {		if (ScheduleEnd != OldScheduleEnd) {
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions We have to clear all dependancy calculations since the pseudo instruction might use already calculated SD node with calculated dependency, look at scheduling_pseudo.ll testcase. dtemirbulatov: We have to clear all dependancy calculations since the pseudo instruction might use already…
// The scheduling region got new instructions at the lower end (or it is a		// The scheduling region got new instructions at the lower end (or it is a
// new region for the first bundle). This makes it necessary to		// new region for the first bundle). This makes it necessary to
// recalculate all dependencies.		// recalculate all dependencies.
// It is seldom that this needs to be done a second time after adding the		// It is seldom that this needs to be done a second time after adding the
// initial bundle to the region.		// initial bundle to the region.
for (auto *I = ScheduleStart; I != ScheduleEnd; I = I->getNextNode()) {		for (auto *I = ScheduleStart; I != ScheduleEnd; I = I->getNextNode()) {
ScheduleData *SD = getScheduleData(I);		doForAllOpcodes(I, [](ScheduleData *SD) {
SD->clearDependencies();		SD->clearDependencies();
		});
}		}
ReSchedule = true;		ReSchedule = true;
}		}
if (ReSchedule) {		if (ReSchedule) {
resetSchedule();		resetSchedule();
initialFillReadyList(ReadyInsts);		initialFillReadyList(ReadyInsts);
}		}

Show All 10 Lines	while (!Bundle->isReady() && !ReadyInsts.empty()) {

ScheduleData *pickedSD = ReadyInsts.back();		ScheduleData *pickedSD = ReadyInsts.back();
ReadyInsts.pop_back();		ReadyInsts.pop_back();

if (pickedSD->isSchedulingEntity() && pickedSD->isReady()) {		if (pickedSD->isSchedulingEntity() && pickedSD->isReady()) {
schedule(pickedSD, ReadyInsts);		schedule(pickedSD, ReadyInsts);
}		}
}		}
if (!Bundle->isReady()) {		if (!Bundle->isReady()) {
		ABataevUnsubmitted Not Done Reply Inline Actions Remove this line ABataev: Remove this line
cancelScheduling(VL, OpValue);		cancelScheduling(VL, OpValue);
return false;		return false;
}		}
return true;		return true;
		ABataevUnsubmitted Not Done Reply Inline Actions I think you can check it earlier ABataev: I think you can check it earlier
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions Done. dtemirbulatov: Done.
}		}

		ABataevUnsubmitted Not Done Reply Inline Actions `auto OpInstr` ABataev:* `auto *OpInstr`
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions Done. dtemirbulatov: Done.
void BoUpSLP::BlockScheduling::cancelScheduling(ArrayRef<Value *> VL,		void BoUpSLP::BlockScheduling::cancelScheduling(ArrayRef<Value *> VL,
Value *OpValue) {		Value *OpValue) {
		RKSimonUnsubmitted Not Done Reply Inline Actions You might be able to update cancelScheduling to use this extra arg as a NFC, and add VL0 in the calls. RKSimon: You might be able to update cancelScheduling to use this extra arg as a NFC, and add VL0 in the…
if (isa<PHINode>(OpValue))		if (isa<PHINode>(OpValue))
		ABataevUnsubmitted Not Done Reply Inline Actions `auto Instr` ABataev:* `auto *Instr`
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions Done. dtemirbulatov: Done.
return;		return;

ScheduleData *Bundle = getScheduleData(OpValue);		ScheduleData *Bundle = getScheduleData(OpValue)->FirstInBundle;
DEBUG(dbgs() << "SLP: cancel scheduling of " << *Bundle << "\n");		DEBUG(dbgs() << "SLP: cancel scheduling of " << *Bundle << "\n");
assert(!Bundle->IsScheduled &&		assert(!Bundle->IsScheduled &&
"Can't cancel bundle which is already scheduled");		"Can't cancel bundle which is already scheduled");
assert(Bundle->isSchedulingEntity() && Bundle->isPartOfBundle() &&		assert(Bundle->isSchedulingEntity() && Bundle->isPartOfBundle() &&
"tried to unbundle something which is not a bundle");		"tried to unbundle something which is not a bundle");

// Un-bundle: make single instructions out of the bundle.		// Un-bundle: make single instructions out of the bundle.
ScheduleData *BundleMember = Bundle;		ScheduleData *BundleMember = Bundle;
while (BundleMember) {		while (BundleMember) {
assert(BundleMember->FirstInBundle == Bundle && "corrupt bundle links");		assert(BundleMember->FirstInBundle == Bundle && "corrupt bundle links");
BundleMember->FirstInBundle = BundleMember;		BundleMember->FirstInBundle = BundleMember;
ScheduleData *Next = BundleMember->NextInBundle;		ScheduleData *Next = BundleMember->NextInBundle;
BundleMember->NextInBundle = nullptr;		BundleMember->NextInBundle = nullptr;
BundleMember->UnscheduledDepsInBundle = BundleMember->UnscheduledDeps;		BundleMember->UnscheduledDepsInBundle = BundleMember->UnscheduledDeps;
if (BundleMember->UnscheduledDepsInBundle == 0) {		if (BundleMember->UnscheduledDepsInBundle == 0) {
ReadyInsts.insert(BundleMember);		ReadyInsts.insert(BundleMember);
}		}
BundleMember = Next;		BundleMember = Next;
}		}
}		}

bool BoUpSLP::BlockScheduling::extendSchedulingRegion(Value *V) {		BoUpSLP::ScheduleData *BoUpSLP::BlockScheduling::allocateScheduleDataChunks() {
if (getScheduleData(V))		// Allocate a new ScheduleData for the instruction.
		if (ChunkPos >= ChunkSize) {
		ScheduleDataChunks.push_back(llvm::make_unique<ScheduleData[]>(ChunkSize));
		ChunkPos = 0;
		}
		return &(ScheduleDataChunks.back()[ChunkPos++]);
		}

		bool BoUpSLP::BlockScheduling::extendSchedulingRegion(Value *V,
		Value *OpValue) {
		if (getScheduleData(V, isOneOf(OpValue, V)))
return true;		return true;
Instruction *I = dyn_cast<Instruction>(V);		Instruction *I = dyn_cast<Instruction>(V);
assert(I && "bundle member must be an instruction");		assert(I && "bundle member must be an instruction");
assert(!isa<PHINode>(I) && "phi nodes don't need to be scheduled");		assert(!isa<PHINode>(I) && "phi nodes don't need to be scheduled");
		auto &&CheckSheduleForI = [this, OpValue](Instruction *I) -> bool {
		if (ScheduleData *ISD = getScheduleData(I)) {
		assert(isInSchedulingRegion(ISD) &&
		"new ScheduleData already in scheduling region");
		(void)ISD;
		ScheduleData *SD = allocateScheduleDataChunks();
		SD->Inst = I;
		SD->init(SchedulingRegionID, OpValue);
		ExtraScheduleDataMap[I][OpValue] = SD;
		return true;
		}
		return false;
		};
		if (CheckSheduleForI(I))
		return true;
if (!ScheduleStart) {		if (!ScheduleStart) {
// It's the first instruction in the new region.		// It's the first instruction in the new region.
initScheduleData(I, I->getNextNode(), nullptr, nullptr);		initScheduleData(I, I->getNextNode(), nullptr, nullptr);
ScheduleStart = I;		ScheduleStart = I;
ScheduleEnd = I->getNextNode();		ScheduleEnd = I->getNextNode();
		if (isOneOf(OpValue, I) != I)
		CheckSheduleForI(I);
		RKSimonUnsubmitted Not Done Reply Inline Actions Do we need the (void)? RKSimon: Do we need the (void)?
assert(ScheduleEnd && "tried to vectorize a TerminatorInst?");		assert(ScheduleEnd && "tried to vectorize a TerminatorInst?");
DEBUG(dbgs() << "SLP: initialize schedule region to " << *I << "\n");		DEBUG(dbgs() << "SLP: initialize schedule region to " << *I << "\n");
return true;		return true;
}		}
// Search up and down at the same time, because we don't know if the new		// Search up and down at the same time, because we don't know if the new
// instruction is above or below the existing scheduling region.		// instruction is above or below the existing scheduling region.
BasicBlock::reverse_iterator UpIter =		BasicBlock::reverse_iterator UpIter =
++ScheduleStart->getIterator().getReverse();		++ScheduleStart->getIterator().getReverse();
BasicBlock::reverse_iterator UpperEnd = BB->rend();		BasicBlock::reverse_iterator UpperEnd = BB->rend();
BasicBlock::iterator DownIter = ScheduleEnd->getIterator();		BasicBlock::iterator DownIter = ScheduleEnd->getIterator();
BasicBlock::iterator LowerEnd = BB->end();		BasicBlock::iterator LowerEnd = BB->end();
for (;;) {		for (;;) {
if (++ScheduleRegionSize > ScheduleRegionSizeLimit) {		if (++ScheduleRegionSize > ScheduleRegionSizeLimit) {
DEBUG(dbgs() << "SLP: exceeded schedule region size limit\n");		DEBUG(dbgs() << "SLP: exceeded schedule region size limit\n");
return false;		return false;
}		}

if (UpIter != UpperEnd) {		if (UpIter != UpperEnd) {
if (&*UpIter == I) {		if (&*UpIter == I) {
initScheduleData(I, ScheduleStart, nullptr, FirstLoadStoreInRegion);		initScheduleData(I, ScheduleStart, nullptr, FirstLoadStoreInRegion);
ScheduleStart = I;		ScheduleStart = I;
		if (isOneOf(OpValue, I) != I)
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) Put the CheckSheduleForI(I); on a newline RKSimon: (style) Put the CheckSheduleForI(I); on a newline
		CheckSheduleForI(I);
		RKSimonUnsubmitted Not Done Reply Inline Actions Do we need the (void)? RKSimon: Do we need the (void)?
DEBUG(dbgs() << "SLP: extend schedule region start to " << *I << "\n");		DEBUG(dbgs() << "SLP: extend schedule region start to " << *I << "\n");
return true;		return true;
}		}
UpIter++;		UpIter++;
}		}
if (DownIter != LowerEnd) {		if (DownIter != LowerEnd) {
if (&*DownIter == I) {		if (&*DownIter == I) {
initScheduleData(ScheduleEnd, I->getNextNode(), LastLoadStoreInRegion,		initScheduleData(ScheduleEnd, I->getNextNode(), LastLoadStoreInRegion,
nullptr);		nullptr);
ScheduleEnd = I->getNextNode();		ScheduleEnd = I->getNextNode();
		if (isOneOf(OpValue, I) != I)
		RKSimonUnsubmitted Not Done Reply Inline Actions (style) Put the CheckSheduleForI(I); on a newline RKSimon: (style) Put the CheckSheduleForI(I); on a newline
		CheckSheduleForI(I);
		RKSimonUnsubmitted Not Done Reply Inline Actions Do we need the (void)? RKSimon: Do we need the (void)?
assert(ScheduleEnd && "tried to vectorize a TerminatorInst?");		assert(ScheduleEnd && "tried to vectorize a TerminatorInst?");
DEBUG(dbgs() << "SLP: extend schedule region end to " << *I << "\n");		DEBUG(dbgs() << "SLP: extend schedule region end to " << *I << "\n");
return true;		return true;
}		}
DownIter++;		DownIter++;
}		}
assert((UpIter != UpperEnd \|\| DownIter != LowerEnd) &&		assert((UpIter != UpperEnd \|\| DownIter != LowerEnd) &&
"instruction not found in block");		"instruction not found in block");
}		}
return true;		return true;
}		}

void BoUpSLP::BlockScheduling::initScheduleData(Instruction *FromI,		void BoUpSLP::BlockScheduling::initScheduleData(Instruction *FromI,
Instruction *ToI,		Instruction *ToI,
ScheduleData *PrevLoadStore,		ScheduleData *PrevLoadStore,
ScheduleData *NextLoadStore) {		ScheduleData *NextLoadStore) {
ScheduleData *CurrentLoadStore = PrevLoadStore;		ScheduleData *CurrentLoadStore = PrevLoadStore;
for (Instruction *I = FromI; I != ToI; I = I->getNextNode()) {		for (Instruction *I = FromI; I != ToI; I = I->getNextNode()) {
ScheduleData *SD = ScheduleDataMap[I];		ScheduleData *SD = ScheduleDataMap[I];
if (!SD) {		if (!SD) {
// Allocate a new ScheduleData for the instruction.		SD = allocateScheduleDataChunks();
if (ChunkPos >= ChunkSize) {
ScheduleDataChunks.push_back(
llvm::make_unique<ScheduleData[]>(ChunkSize));
ChunkPos = 0;
}
RKSimonUnsubmitted Not Done Reply Inline Actions Is this an NFC pre-commit? RKSimon: Is this an NFC pre-commit?
SD = &(ScheduleDataChunks.back()[ChunkPos++]);
ScheduleDataMap[I] = SD;		ScheduleDataMap[I] = SD;
SD->Inst = I;		SD->Inst = I;
}		}
assert(!isInSchedulingRegion(SD) &&		assert(!isInSchedulingRegion(SD) &&
"new ScheduleData already in scheduling region");		"new ScheduleData already in scheduling region");
SD->init(SchedulingRegionID);		SD->init(SchedulingRegionID, I);

if (I->mayReadOrWriteMemory()) {		if (I->mayReadOrWriteMemory()) {
// Update the linked list of memory accessing instructions.		// Update the linked list of memory accessing instructions.
if (CurrentLoadStore) {		if (CurrentLoadStore) {
CurrentLoadStore->NextLoadStore = SD;		CurrentLoadStore->NextLoadStore = SD;
} else {		} else {
FirstLoadStoreInRegion = SD;		FirstLoadStoreInRegion = SD;
}		}
Show All 24 Lines	while (!WorkList.empty()) {
while (BundleMember) {		while (BundleMember) {
assert(isInSchedulingRegion(BundleMember));		assert(isInSchedulingRegion(BundleMember));
if (!BundleMember->hasValidDependencies()) {		if (!BundleMember->hasValidDependencies()) {

DEBUG(dbgs() << "SLP: update deps of " << *BundleMember << "\n");		DEBUG(dbgs() << "SLP: update deps of " << *BundleMember << "\n");
BundleMember->Dependencies = 0;		BundleMember->Dependencies = 0;
BundleMember->resetUnscheduledDeps();		BundleMember->resetUnscheduledDeps();

		if (BundleMember->OpValue != BundleMember->Inst) {
		ScheduleData *UseSD = getScheduleData(BundleMember->Inst);
		if (UseSD && isInSchedulingRegion(UseSD->FirstInBundle)) {
		BundleMember->Dependencies++;
		ScheduleData *DestBundle = UseSD->FirstInBundle;
		if (!DestBundle->IsScheduled)
		BundleMember->incrementUnscheduledDeps(1);
		if (!DestBundle->hasValidDependencies())
		WorkList.push_back(DestBundle);
		}
		} else {
// Handle def-use chain dependencies.		// Handle def-use chain dependencies.
for (User *U : BundleMember->Inst->users()) {		for (User *U : BundleMember->Inst->users()) {
if (isa<Instruction>(U)) {		if (isa<Instruction>(U)) {
ScheduleData *UseSD = getScheduleData(U);		ScheduleData *UseSD = getScheduleData(U);
if (UseSD && isInSchedulingRegion(UseSD->FirstInBundle)) {		if (UseSD && isInSchedulingRegion(UseSD->FirstInBundle)) {
BundleMember->Dependencies++;		BundleMember->Dependencies++;
ScheduleData *DestBundle = UseSD->FirstInBundle;		ScheduleData *DestBundle = UseSD->FirstInBundle;
if (!DestBundle->IsScheduled)		if (!DestBundle->IsScheduled)
BundleMember->incrementUnscheduledDeps(1);		BundleMember->incrementUnscheduledDeps(1);
if (!DestBundle->hasValidDependencies())		if (!DestBundle->hasValidDependencies())
WorkList.push_back(DestBundle);		WorkList.push_back(DestBundle);
}		}
} else {		} else {
// I'm not sure if this can ever happen. But we need to be safe.		// I'm not sure if this can ever happen. But we need to be safe.
// This lets the instruction/bundle never be scheduled and		// This lets the instruction/bundle never be scheduled and
// eventually disable vectorization.		// eventually disable vectorization.
BundleMember->Dependencies++;		BundleMember->Dependencies++;
BundleMember->incrementUnscheduledDeps(1);		BundleMember->incrementUnscheduledDeps(1);
}		}
}		}
		}

// Handle the memory dependencies.		// Handle the memory dependencies.
ScheduleData *DepDest = BundleMember->NextLoadStore;		ScheduleData *DepDest = BundleMember->NextLoadStore;
if (DepDest) {		if (DepDest) {
Instruction *SrcInst = BundleMember->Inst;		Instruction *SrcInst = BundleMember->Inst;
MemoryLocation SrcLoc = getLocation(SrcInst, SLP->AA);		MemoryLocation SrcLoc = getLocation(SrcInst, SLP->AA);
bool SrcMayWrite = BundleMember->Inst->mayWriteToMemory();		bool SrcMayWrite = BundleMember->Inst->mayWriteToMemory();
unsigned numAliased = 0;		unsigned numAliased = 0;
unsigned DistToSrc = 1;		unsigned DistToSrc = 1;

while (DepDest) {		while (DepDest) {
assert(isInSchedulingRegion(DepDest));		assert(isInSchedulingRegion(DepDest));

// We have two limits to reduce the complexity:		// We have two limits to reduce the complexity:
// 1) AliasedCheckLimit: It's a small limit to reduce calls to		// 1) AliasedCheckLimit: It's a small limit to reduce calls to
// SLP->isAliased (which is the expensive part in this loop).		// SLP->isAliased (which is the expensive part in this loop).
// 2) MaxMemDepDistance: It's for very large blocks and it aborts		// 2) MaxMemDepDistance: It's for very large blocks and it aborts
// the whole loop (even if the loop is fast, it's quadratic).		// the whole loop (even if the loop is fast, it's quadratic).
		RKSimonUnsubmitted Not Done Reply Inline Actions Remove these brackets - they can be done as a NFC commit right away and not affect this patch. RKSimon: Remove these brackets - they can be done as a NFC commit right away and not affect this patch.
// It's important for the loop break condition (see below) to		// It's important for the loop break condition (see below) to
// check this limit even between two read-only instructions.		// check this limit even between two read-only instructions.
if (DistToSrc >= MaxMemDepDistance \|\|		if (DistToSrc >= MaxMemDepDistance \|\|
((SrcMayWrite \|\| DepDest->Inst->mayWriteToMemory()) &&		((SrcMayWrite \|\| DepDest->Inst->mayWriteToMemory()) &&
(numAliased >= AliasedCheckLimit \|\|		(numAliased >= AliasedCheckLimit \|\|
SLP->isAliased(SrcLoc, SrcInst, DepDest->Inst)))) {		SLP->isAliased(SrcLoc, SrcInst, DepDest->Inst)))) {

// We increment the counter only if the locations are aliased		// We increment the counter only if the locations are aliased
Show All 40 Lines	while (!WorkList.empty()) {
}		}
}		}
}		}

void BoUpSLP::BlockScheduling::resetSchedule() {		void BoUpSLP::BlockScheduling::resetSchedule() {
assert(ScheduleStart &&		assert(ScheduleStart &&
"tried to reset schedule on block which has not been scheduled");		"tried to reset schedule on block which has not been scheduled");
for (Instruction *I = ScheduleStart; I != ScheduleEnd; I = I->getNextNode()) {		for (Instruction *I = ScheduleStart; I != ScheduleEnd; I = I->getNextNode()) {
ScheduleData *SD = getScheduleData(I);		doForAllOpcodes(I, [this](ScheduleData *SD) {
assert(isInSchedulingRegion(SD));		assert(isInSchedulingRegion(SD));
SD->IsScheduled = false;		SD->IsScheduled = false;
SD->resetUnscheduledDeps();		SD->resetUnscheduledDeps();
		});
}		}
ReadyInsts.clear();		ReadyInsts.clear();
}		}

void BoUpSLP::scheduleBlock(BlockScheduling *BS) {		void BoUpSLP::scheduleBlock(BlockScheduling *BS) {

if (!BS->ScheduleStart)		if (!BS->ScheduleStart)
return;		return;
Show All 13 Lines	void BoUpSLP::scheduleBlock(BlockScheduling *BS) {
std::set<ScheduleData *, ScheduleDataCompare> ReadyInsts;		std::set<ScheduleData *, ScheduleDataCompare> ReadyInsts;

// Ensure that all dependency data is updated and fill the ready-list with		// Ensure that all dependency data is updated and fill the ready-list with
// initial instructions.		// initial instructions.
int Idx = 0;		int Idx = 0;
int NumToSchedule = 0;		int NumToSchedule = 0;
for (auto *I = BS->ScheduleStart; I != BS->ScheduleEnd;		for (auto *I = BS->ScheduleStart; I != BS->ScheduleEnd;
I = I->getNextNode()) {		I = I->getNextNode()) {
ScheduleData *SD = BS->getScheduleData(I);		BS->doForAllOpcodes(I, [this, &Idx, &NumToSchedule, BS](ScheduleData *SD) {
assert(		assert(SD->isPartOfBundle() ==
SD->isPartOfBundle() == (getTreeEntry(SD->Inst) != nullptr) &&		(getTreeEntry(SD->Inst, SD->OpValue) != nullptr) &&
"scheduler and vectorizer have different opinion on what is a bundle");		"scheduler and vectorizer have different opinion on what is a "
		"bundle");
SD->FirstInBundle->SchedulingPriority = Idx++;		SD->FirstInBundle->SchedulingPriority = Idx++;
if (SD->isSchedulingEntity()) {		if (SD->isSchedulingEntity()) {
BS->calculateDependencies(SD, false, this);		BS->calculateDependencies(SD, false, this);
NumToSchedule++;		NumToSchedule++;
}		}
		});
}		}
BS->initialFillReadyList(ReadyInsts);		BS->initialFillReadyList(ReadyInsts);

Instruction *LastScheduledInst = BS->ScheduleEnd;		Instruction *LastScheduledInst = BS->ScheduleEnd;

// Do the "real" scheduling.		// Do the "real" scheduling.
while (!ReadyInsts.empty()) {		while (!ReadyInsts.empty()) {
ScheduleData picked = ReadyInsts.begin();		ScheduleData picked = ReadyInsts.begin();
ReadyInsts.erase(ReadyInsts.begin());		ReadyInsts.erase(ReadyInsts.begin());

// Move the scheduled instruction(s) to their dedicated places, if not		// Move the scheduled instruction(s) to their dedicated places, if not
// there yet.		// there yet.
ScheduleData *BundleMember = picked;		ScheduleData *BundleMember = picked;
while (BundleMember) {		while (BundleMember) {
Instruction *pickedInst = BundleMember->Inst;		Instruction *pickedInst = BundleMember->Inst;
if (LastScheduledInst->getNextNode() != pickedInst) {		bool SameOpcode = pickedInst == BundleMember->OpValue;
		if (SameOpcode && LastScheduledInst->getNextNode() != pickedInst) {
BS->BB->getInstList().remove(pickedInst);		BS->BB->getInstList().remove(pickedInst);
BS->BB->getInstList().insert(LastScheduledInst->getIterator(),		BS->BB->getInstList().insert(LastScheduledInst->getIterator(),
pickedInst);		pickedInst);
}		}
		ABataevUnsubmitted Not Done Reply Inline Actions Remove this, it is not needed ABataev: Remove this, it is not needed
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions Please check schedule-bundle1.ll testcase, without this change scheduling is not correct. dtemirbulatov: Please check schedule-bundle1.ll testcase, without this change scheduling is not correct.
		if (SameOpcode)
LastScheduledInst = pickedInst;		LastScheduledInst = pickedInst;
		RKSimonUnsubmitted Not Done Reply Inline Actions if (pickedInst == BundleMember->OpValue) { if (LastScheduledInst->getNextNode() != pickedInst) { BS->BB->getInstList().remove(pickedInst); BS->BB->getInstList().insert(LastScheduledInst->getIterator(), pickedInst); } LastScheduledInst = pickedInst; } RKSimon: ``` if (pickedInst == BundleMember->OpValue) { if (LastScheduledInst->getNextNode() !=…
BundleMember = BundleMember->NextInBundle;		BundleMember = BundleMember->NextInBundle;
}		}
		ABataevUnsubmitted Not Done Reply Inline Actions Use `SmallSet` instead Why you can't use `SmallVector` instead? ABataev: 1. Use `SmallSet` instead 2. Why you can't use `SmallVector` instead?
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions Replaced to SmallSet, We could not use Vector here since it is not possible to have multiple entities to a single Instruction. dtemirbulatov: Replaced to SmallSet, We could not use Vector here since it is not possible to have multiple…
		ABataevUnsubmitted Not Done Reply Inline Actions The main problem with this code that you're checking only one level of dependency. What if you have dependency deeper, in 2, 3 or more level? Will it work? The code itself is very complex and, if we're going to keep this solution, must be outlined in a separate function. ABataev: The main problem with this code that you're checking only one level of dependency. What if you…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions yes, I thought the same, I have another heavier version that utilizes Dependancy Analyst, but I have not seen yet such an example where current implementation is not enough. I have 2 or 3 testcases for the issue so far. I will think again about the problem. dtemirbulatov: yes, I thought the same, I have another heavier version that utilizes Dependancy Analyst, but I…
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions probably level 2 or 3 dependencies might be ok since it is not encapsulated in a single operation. dtemirbulatov: probably level 2 or 3 dependencies might be ok since it is not encapsulated in a single…

ABataevUnsubmitted Not Done Reply Inline Actions Restore this line ABataev: Restore this line
BS->schedule(picked, ReadyInsts);		BS->schedule(picked, ReadyInsts);
NumToSchedule--;		NumToSchedule--;
		ABataevUnsubmitted Not Done Reply Inline Actions Wrong formatting. Use `clang-format` Enclose into braces ABataev: 1. Wrong formatting. Use `clang-format` 2. Enclose into braces
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions done. dtemirbulatov: done.
}		}
assert(NumToSchedule == 0 && "could not schedule all instructions");		assert(NumToSchedule == 0 && "could not schedule all instructions");

		ABataevUnsubmitted Not Done Reply Inline Actions `pickedInst`->`PickedInst` ABataev: `pickedInst`->`PickedInst`
// Avoid duplicate scheduling of the block.		// Avoid duplicate scheduling of the block.
		dtemirbulatovAuthorUnsubmitted Not Done Reply Inline Actions And these dominance errors occur rarely. dtemirbulatov: And these dominance errors occur rarely.
BS->ScheduleStart = nullptr;		BS->ScheduleStart = nullptr;
}		}

unsigned BoUpSLP::getVectorElementSize(Value *V) {		unsigned BoUpSLP::getVectorElementSize(Value *V) {
// If V is a store, just return the width of the stored value without		// If V is a store, just return the width of the stored value without
		ABataevUnsubmitted Not Done Reply Inline Actions Remove it, does not do anything ABataev: Remove it, does not do anything
// traversing the expression tree. This is the common case.		// traversing the expression tree. This is the common case.
if (auto *Store = dyn_cast<StoreInst>(V))		if (auto *Store = dyn_cast<StoreInst>(V))
return DL->getTypeSizeInBits(Store->getValueOperand()->getType());		return DL->getTypeSizeInBits(Store->getValueOperand()->getType());

// If V is not a store, we can traverse the expression tree to find loads		// If V is not a store, we can traverse the expression tree to find loads
// that feed it. The type of the loaded value may indicate a more suitable		// that feed it. The type of the loaded value may indicate a more suitable
// width than V's type. We want to base the vector element size on the width		// width than V's type. We want to base the vector element size on the width
// of memory operations where possible.		// of memory operations where possible.
▲ Show 20 Lines • Show All 1,027 Lines • ▼ Show 20 Lines	while (i < NumReducedVals - ReduxWidth + 1 && ReduxWidth > 2) {

// Emit a reduction.		// Emit a reduction.
Value *ReducedSubTree =		Value *ReducedSubTree =
emitReduction(VectorizedRoot, Builder, ReduxWidth, ReductionOps, TTI);		emitReduction(VectorizedRoot, Builder, ReduxWidth, ReductionOps, TTI);
if (VectorizedTree) {		if (VectorizedTree) {
Builder.SetCurrentDebugLocation(Loc);		Builder.SetCurrentDebugLocation(Loc);
VectorizedTree = Builder.CreateBinOp(ReductionOpcode, VectorizedTree,		VectorizedTree = Builder.CreateBinOp(ReductionOpcode, VectorizedTree,
ReducedSubTree, "bin.rdx");		ReducedSubTree, "bin.rdx");
propagateIRFlags(VectorizedTree, ReductionOps);		InstructionsState S = getSameOpcode(ReductionOps);
		assert(!S.IsAltShuffle && "Unexpected alternate opcode");
		RKSimonUnsubmitted Not Done Reply Inline Actions Always add an assert message. RKSimon: Always add an assert message.
		propagateIRFlags(VectorizedTree, ReductionOps, S.OpValue);
} else		} else
VectorizedTree = ReducedSubTree;		VectorizedTree = ReducedSubTree;
i += ReduxWidth;		i += ReduxWidth;
ReduxWidth = PowerOf2Floor(NumReducedVals - i);		ReduxWidth = PowerOf2Floor(NumReducedVals - i);
}		}

if (VectorizedTree) {		if (VectorizedTree) {
// Finish the reduction.		// Finish the reduction.
for (; i < NumReducedVals; ++i) {		for (; i < NumReducedVals; ++i) {
auto *I = cast<Instruction>(ReducedVals[i]);		auto *I = cast<Instruction>(ReducedVals[i]);
Builder.SetCurrentDebugLocation(I->getDebugLoc());		Builder.SetCurrentDebugLocation(I->getDebugLoc());
VectorizedTree =		VectorizedTree =
Builder.CreateBinOp(ReductionOpcode, VectorizedTree, I);		Builder.CreateBinOp(ReductionOpcode, VectorizedTree, I);
propagateIRFlags(VectorizedTree, ReductionOps);		InstructionsState S = getSameOpcode(ReductionOps);
		assert(!S.IsAltShuffle && "Unexpected alternate opcode");
		RKSimonUnsubmitted Not Done Reply Inline Actions Always add an assert message. RKSimon: Always add an assert message.
		propagateIRFlags(VectorizedTree, ReductionOps, S.OpValue);
}		}
for (auto &Pair : ExternallyUsedValues) {		for (auto &Pair : ExternallyUsedValues) {
assert(!Pair.second.empty() &&		assert(!Pair.second.empty() &&
"At least one DebugLoc must be inserted");		"At least one DebugLoc must be inserted");
// Add each externally used value to the final reduction.		// Add each externally used value to the final reduction.
for (auto *I : Pair.second) {		for (auto *I : Pair.second) {
Builder.SetCurrentDebugLocation(I->getDebugLoc());		Builder.SetCurrentDebugLocation(I->getDebugLoc());
VectorizedTree = Builder.CreateBinOp(ReductionOpcode, VectorizedTree,		VectorizedTree = Builder.CreateBinOp(ReductionOpcode, VectorizedTree,
Pair.first, "bin.extra");		Pair.first, "bin.extra");
propagateIRFlags(VectorizedTree, I);		InstructionsState S = getSameOpcode(I);
		assert(!S.IsAltShuffle && "Unexpected alternate opcode");
		RKSimonUnsubmitted Not Done Reply Inline Actions Always add an assert message. RKSimon: Always add an assert message.
		propagateIRFlags(VectorizedTree, I, S.OpValue);
}		}
}		}
// Update users.		// Update users.
ReductionRoot->replaceAllUsesWith(VectorizedTree);		ReductionRoot->replaceAllUsesWith(VectorizedTree);
}		}
return VectorizedTree != nullptr;		return VectorizedTree != nullptr;
}		}

▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	for (unsigned i = ReduxWidth / 2; i != 0; i >>= 1) {

Value *LeftShuf = Builder.CreateShuffleVector(		Value *LeftShuf = Builder.CreateShuffleVector(
TmpVec, UndefValue::get(TmpVec->getType()), LeftMask, "rdx.shuf.l");		TmpVec, UndefValue::get(TmpVec->getType()), LeftMask, "rdx.shuf.l");
Value *RightShuf = Builder.CreateShuffleVector(		Value *RightShuf = Builder.CreateShuffleVector(
TmpVec, UndefValue::get(TmpVec->getType()), (RightMask),		TmpVec, UndefValue::get(TmpVec->getType()), (RightMask),
"rdx.shuf.r");		"rdx.shuf.r");
TmpVec =		TmpVec =
Builder.CreateBinOp(ReductionOpcode, LeftShuf, RightShuf, "bin.rdx");		Builder.CreateBinOp(ReductionOpcode, LeftShuf, RightShuf, "bin.rdx");
propagateIRFlags(TmpVec, RedOps);		InstructionsState S = getSameOpcode(RedOps);
		assert(!S.IsAltShuffle && "Unexpected alternate opcode");
		RKSimonUnsubmitted Not Done Reply Inline Actions Always add an assert message. RKSimon: Always add an assert message.
		propagateIRFlags(TmpVec, RedOps, S.OpValue);
}		}

// The result is in the first element of the vector.		// The result is in the first element of the vector.
return Builder.CreateExtractElement(TmpVec, Builder.getInt32(0));		return Builder.CreateExtractElement(TmpVec, Builder.getInt32(0));
}		}
};		};
} // end anonymous namespace		} // end anonymous namespace

▲ Show 20 Lines • Show All 517 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll

Show All 37 Lines	entry:
store i32 %add9, i32* %incdec.ptr7, align 4		store i32 %add9, i32* %incdec.ptr7, align 4
ret void		ret void
}		}

define void @add1(i32* noalias %dst, i32* noalias %src) {		define void @add1(i32* noalias %dst, i32* noalias %src) {
; CHECK-LABEL: @add1(		; CHECK-LABEL: @add1(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
; CHECK-NEXT: store i32 [[TMP0]], i32* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[ADD3:%.*]] = add nsw i32 [[TMP1]], 1
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
; CHECK-NEXT: store i32 [[ADD3]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[ADD6:%.*]] = add nsw i32 [[TMP2]], 2
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3
; CHECK-NEXT: store i32 [[ADD6]], i32* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[SRC]] to <4 x i32>*
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR5]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
; CHECK-NEXT: [[ADD9:%.*]] = add nsw i32 [[TMP3]], 3		; CHECK-NEXT: [[TMP2:%.*]] = add nsw <4 x i32> <i32 0, i32 1, i32 2, i32 3>, [[TMP1]]
; CHECK-NEXT: store i32 [[ADD9]], i32* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[DST]] to <4 x i32>*
		; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[TMP3]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
store i32 %0, i32* %dst, align 4		store i32 %0, i32* %dst, align 4
%incdec.ptr2 = getelementptr inbounds i32, i32* %src, i64 2		%incdec.ptr2 = getelementptr inbounds i32, i32* %src, i64 2
Show All 11 Lines	entry:
store i32 %add9, i32* %incdec.ptr7, align 4		store i32 %add9, i32* %incdec.ptr7, align 4
ret void		ret void
}		}

define void @sub0(i32* noalias %dst, i32* noalias %src) {		define void @sub0(i32* noalias %dst, i32* noalias %src) {
; CHECK-LABEL: @sub0(		; CHECK-LABEL: @sub0(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4
; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
; CHECK-NEXT: store i32 [[TMP1]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[SUB5:%.*]] = add nsw i32 [[TMP2]], -2
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3
; CHECK-NEXT: store i32 [[SUB5]], i32* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[SRC]] to <4 x i32>*
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
; CHECK-NEXT: [[SUB8:%.*]] = add nsw i32 [[TMP3]], -3		; CHECK-NEXT: [[TMP2:%.*]] = add nsw <4 x i32> <i32 -1, i32 0, i32 -2, i32 -3>, [[TMP1]]
; CHECK-NEXT: store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4		; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[DST]] to <4 x i32>*
		; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[TMP3]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%sub = add nsw i32 %0, -1		%sub = add nsw i32 %0, -1
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
store i32 %sub, i32* %dst, align 4		store i32 %sub, i32* %dst, align 4
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	entry:
store i32 %sub9, i32* %incdec.ptr7, align 4		store i32 %sub9, i32* %incdec.ptr7, align 4
ret void		ret void
}		}

define void @addsub0(i32* noalias %dst, i32* noalias %src) {		define void @addsub0(i32* noalias %dst, i32* noalias %src) {
; CHECK-LABEL: @addsub0(		; CHECK-LABEL: @addsub0(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4
; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
; CHECK-NEXT: store i32 [[TMP1]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[SUB5:%.*]] = add nsw i32 [[TMP2]], -2
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3
; CHECK-NEXT: store i32 [[SUB5]], i32* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[SRC]] to <4 x i32>*
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
; CHECK-NEXT: [[SUB8:%.*]] = sub nsw i32 [[TMP3]], -3		; CHECK-NEXT: [[TMP2:%.*]] = add nsw <4 x i32> [[TMP1]], <i32 -1, i32 0, i32 -2, i32 -3>
; CHECK-NEXT: store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4		; CHECK-NEXT: [[TMP3:%.*]] = sub nsw <4 x i32> [[TMP1]], <i32 -1, i32 0, i32 -2, i32 -3>
		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
		; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[DST]] to <4 x i32>*
		; CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[TMP5]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%sub = add nsw i32 %0, -1		%sub = add nsw i32 %0, -1
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
store i32 %sub, i32* %dst, align 4		store i32 %sub, i32* %dst, align 4
Show All 11 Lines	entry:
store i32 %sub8, i32* %incdec.ptr6, align 4		store i32 %sub8, i32* %incdec.ptr6, align 4
ret void		ret void
}		}

define void @addsub1(i32* noalias %dst, i32* noalias %src) {		define void @addsub1(i32* noalias %dst, i32* noalias %src) {
; CHECK-LABEL: @addsub1(		; CHECK-LABEL: @addsub1(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4
; CHECK-NEXT: [[SUB:%.*]] = add nsw i32 [[TMP0]], -1
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
; CHECK-NEXT: store i32 [[SUB]], i32* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[SUB1:%.*]] = sub nsw i32 [[TMP1]], -1
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
; CHECK-NEXT: store i32 [[SUB1]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3
; CHECK-NEXT: store i32 [[TMP2]], i32* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[SRC]] to <4 x i32>*
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
; CHECK-NEXT: [[SUB8:%.*]] = sub nsw i32 [[TMP3]], -3		; CHECK-NEXT: [[TMP2:%.*]] = add nsw <4 x i32> [[TMP1]], <i32 -1, i32 -1, i32 0, i32 -3>
; CHECK-NEXT: store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4		; CHECK-NEXT: [[TMP3:%.*]] = sub nsw <4 x i32> [[TMP1]], <i32 -1, i32 -1, i32 0, i32 -3>
		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
		; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[DST]] to <4 x i32>*
		; CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[TMP5]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%sub = add nsw i32 %0, -1		%sub = add nsw i32 %0, -1
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
store i32 %sub, i32* %dst, align 4		store i32 %sub, i32* %dst, align 4
Show All 11 Lines	entry:
store i32 %sub8, i32* %incdec.ptr6, align 4		store i32 %sub8, i32* %incdec.ptr6, align 4
ret void		ret void
}		}

define void @mul(i32* noalias %dst, i32* noalias %src) {		define void @mul(i32* noalias %dst, i32* noalias %src) {
; CHECK-LABEL: @mul(		; CHECK-LABEL: @mul(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4
; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[TMP0]], 257
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
; CHECK-NEXT: store i32 [[MUL]], i32* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[MUL3:%.*]] = mul nsw i32 [[TMP1]], -3
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
; CHECK-NEXT: store i32 [[MUL3]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3
; CHECK-NEXT: store i32 [[TMP2]], i32* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[SRC]] to <4 x i32>*
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR5]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
; CHECK-NEXT: [[MUL9:%.*]] = mul nsw i32 [[TMP3]], -9		; CHECK-NEXT: [[TMP2:%.*]] = mul nsw <4 x i32> <i32 257, i32 -3, i32 1, i32 -9>, [[TMP1]]
; CHECK-NEXT: store i32 [[MUL9]], i32* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[DST]] to <4 x i32>*
		; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[TMP3]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%mul = mul nsw i32 %0, 257		%mul = mul nsw i32 %0, 257
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
store i32 %mul, i32* %dst, align 4		store i32 %mul, i32* %dst, align 4
Show All 11 Lines	entry:
store i32 %mul9, i32* %incdec.ptr7, align 4		store i32 %mul9, i32* %incdec.ptr7, align 4
ret void		ret void
}		}

define void @shl0(i32* noalias %dst, i32* noalias %src) {		define void @shl0(i32* noalias %dst, i32* noalias %src) {
; CHECK-LABEL: @shl0(		; CHECK-LABEL: @shl0(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds i32, i32 [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[SRC]], align 4
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 1
; CHECK-NEXT: store i32 [[TMP0]], i32* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load i32, i32 [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[SHL:%.*]] = shl i32 [[TMP1]], 1
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 2
; CHECK-NEXT: store i32 [[SHL]], i32* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds i32, i32 [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[SHL5:%.*]] = shl i32 [[TMP2]], 2
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 3
; CHECK-NEXT: store i32 [[SHL5]], i32* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[SRC]] to <4 x i32>*
; CHECK-NEXT: [[TMP3:%.]] = load i32, i32 [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
; CHECK-NEXT: [[SHL8:%.*]] = shl i32 [[TMP3]], 3		; CHECK-NEXT: [[TMP2:%.*]] = shl <4 x i32> [[TMP1]], <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: store i32 [[SHL8]], i32* [[INCDEC_PTR6]], align 4		; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[DST]] to <4 x i32>*
		; CHECK-NEXT: store <4 x i32> [[TMP2]], <4 x i32>* [[TMP3]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1		%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4		%0 = load i32, i32* %src, align 4
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
store i32 %0, i32* %dst, align 4		store i32 %0, i32* %dst, align 4
%incdec.ptr2 = getelementptr inbounds i32, i32* %src, i64 2		%incdec.ptr2 = getelementptr inbounds i32, i32* %src, i64 2
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	entry:
store float %add9, float* %incdec.ptr7, align 4		store float %add9, float* %incdec.ptr7, align 4
ret void		ret void
}		}

define void @add1f(float* noalias %dst, float* noalias %src) {		define void @add1f(float* noalias %dst, float* noalias %src) {
; CHECK-LABEL: @add1f(		; CHECK-LABEL: @add1f(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[TMP0]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[ADD3:%.*]] = fadd fast float [[TMP1]], 1.000000e+00
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[ADD3]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[ADD6:%.*]] = fadd fast float [[TMP2]], 2.000000e+00
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[ADD6]], float* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[SRC]] to <4 x float>*
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR5]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4
; CHECK-NEXT: [[ADD9:%.*]] = fadd fast float [[TMP3]], 3.000000e+00		; CHECK-NEXT: [[TMP2:%.*]] = fadd fast <4 x float> <float 0.000000e+00, float 1.000000e+00, float 2.000000e+00, float 3.000000e+00>, [[TMP1]]
; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[DST]] to <4 x float>*
		; CHECK-NEXT: store <4 x float> [[TMP2]], <4 x float>* [[TMP3]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
store float %0, float* %dst, align 4		store float %0, float* %dst, align 4
%incdec.ptr2 = getelementptr inbounds float, float* %src, i64 2		%incdec.ptr2 = getelementptr inbounds float, float* %src, i64 2
Show All 11 Lines	entry:
store float %add9, float* %incdec.ptr7, align 4		store float %add9, float* %incdec.ptr7, align 4
ret void		ret void
}		}

define void @sub0f(float* noalias %dst, float* noalias %src) {		define void @sub0f(float* noalias %dst, float* noalias %src) {
; CHECK-LABEL: @sub0f(		; CHECK-LABEL: @sub0f(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP0]], -1.000000e+00
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[ADD]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[ADD6:%.*]] = fadd fast float [[TMP2]], -2.000000e+00
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[ADD6]], float* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[SRC]] to <4 x float>*
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR5]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4
; CHECK-NEXT: [[ADD9:%.*]] = fadd fast float [[TMP3]], -3.000000e+00		; CHECK-NEXT: [[TMP2:%.*]] = fadd fast <4 x float> <float -1.000000e+00, float 0.000000e+00, float -2.000000e+00, float -3.000000e+00>, [[TMP1]]
; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[DST]] to <4 x float>*
		; CHECK-NEXT: store <4 x float> [[TMP2]], <4 x float>* [[TMP3]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%add = fadd fast float %0, -1.000000e+00		%add = fadd fast float %0, -1.000000e+00
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
store float %add, float* %dst, align 4		store float %add, float* %dst, align 4
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	entry:
store float %sub9, float* %incdec.ptr7, align 4		store float %sub9, float* %incdec.ptr7, align 4
ret void		ret void
}		}

define void @addsub0f(float* noalias %dst, float* noalias %src) {		define void @addsub0f(float* noalias %dst, float* noalias %src) {
; CHECK-LABEL: @addsub0f(		; CHECK-LABEL: @addsub0f(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4
; CHECK-NEXT: [[SUB:%.*]] = fadd fast float [[TMP0]], -1.000000e+00
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[SUB]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[SUB5:%.*]] = fadd fast float [[TMP2]], -2.000000e+00
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[SUB5]], float* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[SRC]] to <4 x float>*
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4
; CHECK-NEXT: [[SUB8:%.*]] = fsub fast float [[TMP3]], -3.000000e+00		; CHECK-NEXT: [[TMP2:%.*]] = fadd fast <4 x float> [[TMP1]], <float -1.000000e+00, float 0.000000e+00, float -2.000000e+00, float -3.000000e+00>
; CHECK-NEXT: store float [[SUB8]], float* [[INCDEC_PTR6]], align 4		; CHECK-NEXT: [[TMP3:%.*]] = fsub fast <4 x float> [[TMP1]], <float -1.000000e+00, float 0.000000e+00, float -2.000000e+00, float -3.000000e+00>
		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> [[TMP3]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
		; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[DST]] to <4 x float>*
		; CHECK-NEXT: store <4 x float> [[TMP4]], <4 x float>* [[TMP5]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%sub = fadd fast float %0, -1.000000e+00		%sub = fadd fast float %0, -1.000000e+00
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
store float %sub, float* %dst, align 4		store float %sub, float* %dst, align 4
Show All 11 Lines	entry:
store float %sub8, float* %incdec.ptr6, align 4		store float %sub8, float* %incdec.ptr6, align 4
ret void		ret void
}		}

define void @addsub1f(float* noalias %dst, float* noalias %src) {		define void @addsub1f(float* noalias %dst, float* noalias %src) {
; CHECK-LABEL: @addsub1f(		; CHECK-LABEL: @addsub1f(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4
; CHECK-NEXT: [[SUB:%.*]] = fadd fast float [[TMP0]], -1.000000e+00
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[SUB]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[SUB1:%.*]] = fsub fast float [[TMP1]], -1.000000e+00
; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR3:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[SUB1]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR6:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[TMP2]], float* [[INCDEC_PTR3]], align 4		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[SRC]] to <4 x float>*
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4
; CHECK-NEXT: [[SUB8:%.*]] = fsub fast float [[TMP3]], -3.000000e+00		; CHECK-NEXT: [[TMP2:%.*]] = fadd fast <4 x float> [[TMP1]], <float -1.000000e+00, float -1.000000e+00, float 0.000000e+00, float -3.000000e+00>
; CHECK-NEXT: store float [[SUB8]], float* [[INCDEC_PTR6]], align 4		; CHECK-NEXT: [[TMP3:%.*]] = fsub fast <4 x float> [[TMP1]], <float -1.000000e+00, float -1.000000e+00, float 0.000000e+00, float -3.000000e+00>
		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> [[TMP3]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
		; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[DST]] to <4 x float>*
		; CHECK-NEXT: store <4 x float> [[TMP4]], <4 x float>* [[TMP5]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%sub = fadd fast float %0, -1.000000e+00		%sub = fadd fast float %0, -1.000000e+00
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
store float %sub, float* %dst, align 4		store float %sub, float* %dst, align 4
Show All 11 Lines	entry:
store float %sub8, float* %incdec.ptr6, align 4		store float %sub8, float* %incdec.ptr6, align 4
ret void		ret void
}		}

define void @mulf(float* noalias %dst, float* noalias %src) {		define void @mulf(float* noalias %dst, float* noalias %src) {
; CHECK-LABEL: @mulf(		; CHECK-LABEL: @mulf(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4
; CHECK-NEXT: [[SUB:%.*]] = fmul fast float [[TMP0]], 2.570000e+02
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[SUB]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[SUB3:%.*]] = fmul fast float [[TMP1]], -3.000000e+00
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[SUB3]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[TMP2]], float* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[SRC]] to <4 x float>*
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR5]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4
; CHECK-NEXT: [[SUB9:%.*]] = fmul fast float [[TMP3]], -9.000000e+00		; CHECK-NEXT: [[TMP2:%.*]] = fmul fast <4 x float> <float 2.570000e+02, float -3.000000e+00, float 1.000000e+00, float -9.000000e+00>, [[TMP1]]
; CHECK-NEXT: store float [[SUB9]], float* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[DST]] to <4 x float>*
		; CHECK-NEXT: store <4 x float> [[TMP2]], <4 x float>* [[TMP3]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%sub = fmul fast float %0, 2.570000e+02		%sub = fmul fast float %0, 2.570000e+02
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
store float %sub, float* %dst, align 4		store float %sub, float* %dst, align 4
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	entry:
store float %add9, float* %incdec.ptr7, align 4		store float %add9, float* %incdec.ptr7, align 4
ret void		ret void
}		}

define void @sub0fn(float* noalias %dst, float* noalias %src) {		define void @sub0fn(float* noalias %dst, float* noalias %src) {
; CHECK-LABEL: @sub0fn(		; CHECK-LABEL: @sub0fn(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load float, float [[SRC]], align 4
; CHECK-NEXT: [[ADD:%.*]] = fadd fast float [[TMP0]], -1.000000e+00
; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1		; CHECK-NEXT: [[INCDEC_PTR1:%.]] = getelementptr inbounds float, float [[DST:%.*]], i64 1
; CHECK-NEXT: store float [[ADD]], float* [[DST]], align 4
; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2		; CHECK-NEXT: [[INCDEC_PTR2:%.]] = getelementptr inbounds float, float [[SRC]], i64 2
; CHECK-NEXT: [[TMP1:%.]] = load float, float [[INCDEC_PTR]], align 4
; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2		; CHECK-NEXT: [[INCDEC_PTR4:%.]] = getelementptr inbounds float, float [[DST]], i64 2
; CHECK-NEXT: store float [[TMP1]], float* [[INCDEC_PTR1]], align 4
; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3		; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds float, float [[SRC]], i64 3
; CHECK-NEXT: [[TMP2:%.]] = load float, float [[INCDEC_PTR2]], align 4
; CHECK-NEXT: [[ADD6:%.*]] = fadd float [[TMP2]], -2.000000e+00
; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3		; CHECK-NEXT: [[INCDEC_PTR7:%.]] = getelementptr inbounds float, float [[DST]], i64 3
; CHECK-NEXT: store float [[ADD6]], float* [[INCDEC_PTR4]], align 4		; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[SRC]] to <4 x float>*
; CHECK-NEXT: [[TMP3:%.]] = load float, float [[INCDEC_PTR5]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <4 x float>, <4 x float> [[TMP0]], align 4
; CHECK-NEXT: [[ADD9:%.*]] = fadd float [[TMP3]], -3.000000e+00		; CHECK-NEXT: [[TMP2:%.*]] = fadd <4 x float> <float -1.000000e+00, float 0.000000e+00, float -2.000000e+00, float -3.000000e+00>, [[TMP1]]
; CHECK-NEXT: store float [[ADD9]], float* [[INCDEC_PTR7]], align 4		; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[DST]] to <4 x float>*
		; CHECK-NEXT: store <4 x float> [[TMP2]], <4 x float>* [[TMP3]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%incdec.ptr = getelementptr inbounds float, float* %src, i64 1		%incdec.ptr = getelementptr inbounds float, float* %src, i64 1
%0 = load float, float* %src, align 4		%0 = load float, float* %src, align 4
%add = fadd fast float %0, -1.000000e+00		%add = fadd fast float %0, -1.000000e+00
%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1		%incdec.ptr1 = getelementptr inbounds float, float* %dst, i64 1
store float %add, float* %dst, align 4		store float %add, float* %dst, align 4
▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops.Needs RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 107683

lib/Transforms/Vectorize/SLPVectorizer.cpp

test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll

[SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops.
Needs RevisionPublic