This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops.
Needs RevisionPublic

Authored by dtemirbulatov on Jan 19 2017, 8:56 AM.

Details

Summary

Patch tries to improve vectorization of the following code:

void add1(int * __restrict dst, const int * __restrict src) {
  *dst++ = *src++;
  *dst++ = *src++ + 1;
  *dst++ = *src++ + 2;
  *dst++ = *src++ + 3;
}

Currently this code cannot be vectorized because the very first operation is not a binary add, but just a load.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
dtemirbulatov added inline comments.Jan 30 2018, 7:30 AM
test/Transforms/SLPVectorizer/X86/internal-dep.ll
10 ↗(On Diff #131752)

ok, Thanks

dtemirbulatov added inline comments.Jan 30 2018, 7:52 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
4079 ↗(On Diff #131574)

probably level 2 or 3 dependencies might be ok since it is not encapsulated in a single operation.

What is happening with this patch? It's been in development for over a year now and still seems to be having problems. Getting PR30787 fixed would be VERY useful and I'm thinking we should be looking at alternatives if this patch is going to carry on stalling/reverting.

I will update the solution shortly, I am currently in changing scheduler in order to avoid non-alternative opcodes in a bundle.

dtemirbulatov updated this revision to Diff 140208.EditedMar 29 2018, 4:38 AM

Removed internal vector dependency check as incorrect and I added the dependency check in case of partial bundle vectorization with non-alternative operations only. At BoUpSLP::vectorizeTree, BoUpSLP::buildTree.

ABataev added inline comments.Mar 29 2018, 6:34 AM
test/Transforms/SLPVectorizer/SystemZ/pr34619.ll
2 ↗(On Diff #140208)

Commit this test as a separate NFC patch with the checks against trunk

test/Transforms/SLPVectorizer/X86/partail.ll
2 ↗(On Diff #140208)

Commit this test as a separate NFC patch with the checks against trunk

test/Transforms/SLPVectorizer/X86/pr35497.ll
2 ↗(On Diff #140208)

Commit this test as a separate NFC patch with the checks against trunk

test/Transforms/SLPVectorizer/X86/resched.ll
2 ↗(On Diff #140208)

Commit this test as a separate NFC patch with the checks against trunk

Update after tests commit, formatting, delete Instruction::ExtractElement restricktion from tryToRepresentAsInstArg.

ABataev added inline comments.Mar 30 2018, 9:08 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1483 ↗(On Diff #140365)
  1. Remove extra parens around Entry->State.IsNonAlt
  2. Why are you skipping bundles with non alternarive opcodes only?
3007–3011 ↗(On Diff #140365)

Rewrite it this way:

SmallPtrSet<Instruction *, 4> BundleInst;
Bundle = Bundle->FirstInBundle;
LastInst = Bundle->Inst;
while (Bundle) {
  BundleInst.insert(Bundle->Inst);
  Bundle = Bundle->NextInBundle;
}
3012–3014 ↗(On Diff #140365)

Why do you need to scan all the instructions in the basic block starting from First? Why you can't use only scheduled instructions?

3015 ↗(On Diff #140365)

Move ++Iter to the header of for loop

3811 ↗(On Diff #140365)

Name of the variable must start from capital letter.

3814 ↗(On Diff #140365)
  1. The code is not formatted
  2. Seems to me you missed break;

Update after Alexey's remarks.

dtemirbulatov added inline comments.Mar 31 2018, 8:10 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1483 ↗(On Diff #140365)

yes, we don't need to check Entry->State.IsNonAlt here, that is similar to getTreeEntry(U, Scalar) == Entry.

3007–3011 ↗(On Diff #140365)

Done. Thanks.

3012–3014 ↗(On Diff #140365)

What do you mean? please elaborate. If you mean ScheduleData here then it is also not always gives up correct sequence of scheduled instructions in a block. Like, bundle member with NextInBundle equals to null is not guaranty to be the last instruction of this bundle among other instructions. If we start with First then it is highly likely that we could iterate across all bundle members and exit instead of iterating to the end of the basic block. Also, I am thinking, to avoid this overhead we could note the last scheduled instruction in BB during scheduling in scheduleBlock() and keep this information in ScheduleData structure.

3015 ↗(On Diff #140365)

Done.

3811 ↗(On Diff #140365)

Done.

3814 ↗(On Diff #140365)

Correct, Thanks.

Rebased, improved complexity of setInsertPointAfterBundle() for already scheduled instructions, minor changes in scheduleBlock()

Also, your code seems not quite formatted. Please, use clang-format on your changes to format it properly.

lib/Transforms/Vectorize/SLPVectorizer.cpp
433 ↗(On Diff #142059)

Enclose it into braces

435 ↗(On Diff #142059)

This too

474 ↗(On Diff #142059)
bool IsNonAlt = llvm::one_of(VL, [Opcode, AltOpcode](Value *V) {return isa<Instruction>(V) && !sameOpcodeOrAlt(Opcode, AltOpcode, cast<Instruction>(V)->getOpcode());});
1616 ↗(On Diff #142059)

Add the debug message here

3112 ↗(On Diff #142059)

Why you can't put bundles in the list in the right order: from the very first instruction to the very last?

3938–3945 ↗(On Diff #142059)

Better to do it this way:

if (llvm::any_of(Scalar->users(), [this, Entry, Scalar](User *U){return !getTreeEntry(U) && getTreeEntry(U, Scalar) == Entry;}))
  continue;
4481 ↗(On Diff #142059)

pickedInst->PickedInst

4486 ↗(On Diff #142059)

Remove this, it is not needed

4500–4502 ↗(On Diff #142059)

Remove it, does not do anything

Update after remarks from Alexey Bataev.

dtemirbulatov added inline comments.Apr 16 2018, 2:27 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
3112 ↗(On Diff #142059)

I could do this in scheduleBlock() function with a queue, but that could add additional complexity.

ABataev added inline comments.Apr 17 2018, 8:10 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
3112 ↗(On Diff #142059)

Could you explain why?

dtemirbulatov added inline comments.Apr 22 2018, 6:19 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
3112 ↗(On Diff #142059)

One instruction could belong to one or more separate bundles... and while we try to change order in bundles at scheduleBlock() we have to update ScheduleDataMap, ExtraScheduleDataMap.

dtemirbulatov added inline comments.Apr 23 2018, 4:09 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
3112 ↗(On Diff #142059)

I mean pseudo operation could occur in more than one bundle.

ABataev added inline comments.Apr 23 2018, 6:24 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
3112 ↗(On Diff #142059)

But these schedule bundles must have different scheduling region id and they must be in a different bundles, why their order changes?

Implement post scheduling bundle reorder of the last element of the bundle according how it was scheduled.

dtemirbulatov added inline comments.May 22 2018, 6:00 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
3112 ↗(On Diff #142059)

The bundle is differerent, but scheduling region id is the same.

dtemirbulatov added inline comments.May 22 2018, 6:27 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
3112 ↗(On Diff #142059)

I mean, for example, for this function:
define void @add0(i32* noalias %dst, i32* noalias %src) {
entry:

%incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
%0 = load i32, i32* %src, align 4
%add = add nsw i32 %0, 1
%incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
store i32 %add, i32* %dst, align 4
%incdec.ptr2 = getelementptr inbounds i32, i32* %src, i64 2
%1 = load i32, i32* %incdec.ptr, align 4
%add3 = add nsw i32 %1, 1
%incdec.ptr4 = getelementptr inbounds i32, i32* %dst, i64 2
store i32 %add3, i32* %incdec.ptr1, align 4
%incdec.ptr5 = getelementptr inbounds i32, i32* %src, i64 3
%2 = load i32, i32* %incdec.ptr2, align 4
%add6 = add nsw i32 %2, 2
%incdec.ptr7 = getelementptr inbounds i32, i32* %dst, i64 3
store i32 %add6, i32* %incdec.ptr4, align 4
%3 = load i32, i32* %incdec.ptr5, align 4
%add9 = add nsw i32 %3, 3
store i32 %add9, i32* %incdec.ptr7, align 4
ret void

}

We have two bundles:
[ %3 = load i32, i32* %src, align 4; %add3 = add nsw i32 %2, 1; %add6 = add nsw i32 %1, 2; %add9 = add nsw i32 %0, 3]
and
[ %3 = load i32, i32* %src, align 4; %2 = load i32, i32* %incdec.ptr, align 4; %1 = load i32, i32* %incdec.ptr2, align 4; %0 = load i32, i32* %incdec.ptr5, align 4]
with the same instruction %3 = load i32, i32* %src, align 4 and one is a pseudo instruction in this bundle [ %3 = load i32, i32* %src, align 4; %add3; %add6; %add9]
and all in the same scheduling region id that equal to 1.

dtemirbulatov added inline comments.May 22 2018, 6:34 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
3112 ↗(On Diff #142059)

why their order changes?

sometimes we have to reschedule a pseudo instruction first in both bundles in order to form correct dependencies.

@dtemirbulatov @ABataev Is there anything that can be done to please keep this patch moving? The next release branch won't be far away now and I originally requested this over a year and a half ago (PR30787), and ideally I'd like to get this in and then I can more easily implement PR33744 as well in time.

I notice there's still some minor refactoring in the patch to use InstructionState in newTreeEntry/tryScheduleBundle calls - is it worth getting that done quickly first?

What about testing - how much testing has been done with external code?

@dtemirbulatov @ABataev Is there anything that can be done to please keep this patch moving? The next release branch won't be far away now and I originally requested this over a year and a half ago (PR30787), and ideally I'd like to get this in and then I can more easily implement PR33744 as well in time.

I think that algorithm at 4512 is incorrect, we need to copy common instructions for bundles, instead of reordering bundles. Alexey?

ABataev added a subscriber: Ayal.Jun 4 2018, 5:44 PM

@dtemirbulatov @ABataev Is there anything that can be done to please keep this patch moving? The next release branch won't be far away now and I originally requested this over a year and a half ago (PR30787), and ideally I'd like to get this in and then I can more easily implement PR33744 as well in time.

I notice there's still some minor refactoring in the patch to use InstructionState in newTreeEntry/tryScheduleBundle calls - is it worth getting that done quickly first?

What about testing - how much testing has been done with external code?

Hi guys, sorry, but I cannot review this patch currently as I'm on the vacation for 3 more weeks. It would be good if somebody else could look at this patch. Try to ask Ayal Zaks (@Ayal), maybe he could help you. As to the implementation itself, I don't like the way it is implemented now. Current scheduling implementation looks like a crutch or a hack. It is very complex, has some unclear logic that may be broken in many ways.

Vasilis added a subscriber: Vasilis.Jun 5 2018, 3:02 PM
Vasilis removed a subscriber: Vasilis.Jun 5 2018, 3:04 PM
vporpo added a subscriber: vporpo.Jun 5 2018, 3:16 PM

Hi, when do you plan to update the path for the latest version? I have an idea of improving the scheduling model, maybe it will help us to resolve the problems with the patch.

Hi, when do you plan to update the path for the latest version? I have an idea of improving the scheduling model, maybe it will help us to resolve the problems with the patch.

Hi, I think I have one issue remaining with LNT's MultiSource/Benchmarks/Olden/bh/bh, I will rebased the current version today, tomorrow. Thanks.

@dtemirbulatov @ABataev I'm not sure how much this will cross over but I'm investigating how to extend the alternate opcode mechanism to work with non-binary instructions. Initially I'm just looking at cast + call operators (e.g. sitofp/uitofp, floor/ceil etc.) but even that requires moderate refactoring of the getEntryCost and vectorizeTree ShuffleVector handling. What I'm not sure of is whether we could extend this idea to accept any partial vectorization and the alternate becomes a pass through - what do you think?

If not there is scope to further simplify this patch, for instance @spatel's work in InstCombine for PR37806 should mean that you can rely on InstCombine to perform much of the work in getDefaultConstantForOpcode etc. and all SLP needs to do is create a "passthrough" SK_Select shuffle stage.

spatel added a comment.Jul 2 2018, 9:46 AM

If not there is scope to further simplify this patch, for instance @spatel's work in InstCombine for PR37806 should mean that you can rely on InstCombine to perform much of the work in getDefaultConstantForOpcode etc. and all SLP needs to do is create a "passthrough" SK_Select shuffle stage.

For reference in D48830, I'm proposing that we use (and extend) ConstantExpr::getBinOpIdentity(). IIUC, that would be the same thing that is shown as getDefaultConstantForOpcode() here.

@dtemirbulatov @ABataev I'm not sure how much this will cross over but I'm investigating how to extend the alternate opcode mechanism to work with non-binary instructions. Initially I'm just looking at cast + call operators (e.g. sitofp/uitofp, floor/ceil etc.) but even that requires moderate refactoring of the getEntryCost and vectorizeTree ShuffleVector handling. What I'm not sure of is whether we could extend this idea to accept any partial vectorization and the alternate becomes a pass through - what do you think?

If not there is scope to further simplify this patch, for instance @spatel's work in InstCombine for PR37806 should mean that you can rely on InstCombine to perform much of the work in getDefaultConstantForOpcode etc. and all SLP needs to do is create a "passthrough" SK_Select shuffle stage.

I think this is possible if I correctly understood your idea. But we need to make this patch land at first. To do so we need to resolve the problems with the scheduling.

If not there is scope to further simplify this patch, for instance @spatel's work in InstCombine for PR37806 should mean that you can rely on InstCombine to perform much of the work in getDefaultConstantForOpcode etc. and all SLP needs to do is create a "passthrough" SK_Select shuffle stage.

Yes, this is also a possible solution. What we need to do in this case is to tweak the cost model + improve gathering algorithm. Yes, that might work.

If not there is scope to further simplify this patch, for instance @spatel's work in InstCombine for PR37806 should mean that you can rely on InstCombine to perform much of the work in getDefaultConstantForOpcode etc. and all SLP needs to do is create a "passthrough" SK_Select shuffle stage.

For reference in D48830, I'm proposing that we use (and extend) ConstantExpr::getBinOpIdentity(). IIUC, that would be the same thing that is shown as getDefaultConstantForOpcode() here.

Yes, I see, we can try to use it. AT least we need to think about this solution, need to estimate all pros/cons here

here is change before 74cd05c4a4f94a27daf2d3fc173e7213060cc47c commit, I am currently rebaseing the change.

dtemirbulatov added inline comments.Jul 2 2018, 12:17 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
649 ↗(On Diff #153765)

we don't need to do this.

dtemirbulatov added inline comments.Jul 2 2018, 12:22 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
3146 ↗(On Diff #153765)

I will move this solution to a dedicated function, so we don't have to measure distance here.

OK - please shout if there is anything I can do to help - I realise my alternate opcode refactoring has made rebasing this patch less straightforward!

Rebase. Still, I need to implement bundle reordering instead of the change in setInsertPointAfterBundle(), fully tested this change. I will split this change in several independent reviews.

Fixed one more issue with duplicating memory dependencies in calculateDependencies() by checking whether we already counted this particular dependency.

RKSimon added inline comments.Jul 31 2018, 4:07 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
469 ↗(On Diff #157957)

InstructionState was keeping the Base/Alt instructions the same if AltOpcodeNum ==0 (BaseIndex ==AltIndex) - why are you inserting nulls?

1526 ↗(On Diff #157957)

Can we use ConstantExpr::getBinOpIdentity instead?

dtemirbulatov added inline comments.Jul 31 2018, 5:41 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
469 ↗(On Diff #157957)

yes, correct, Thanks.

1526 ↗(On Diff #157957)

no, ConstantExpr::getBinOpIdentity does support only commutative operations.

lebedev.ri added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
1526 ↗(On Diff #157957)

Are you sure?

Constant *ConstantExpr::getBinOpIdentity(unsigned Opcode, Type *Ty,
                                         bool AllowRHSConstant) {
...

  // Non-commutative opcodes: AllowRHSConstant must be set.
  if (!AllowRHSConstant)
    return nullptr;
dtemirbulatov added inline comments.Jul 31 2018, 6:01 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1526 ↗(On Diff #157957)

oh, yes, sorry, it just a different behaviour here. we want for example 1 division and ConstantExpr::getBinOpIdentity would return 0.

Rebase, Fixed RKSimon's remark, implemented bundle reordering function.

Add hash code based indexing instead of instruction based, split ScheduleData into InstScheduleData, PseudoScheduleData, improve diagnostics of bundle scheduling.

Why you don't want to use pair<Instruction*, Opcode> as the key in all maps/sets? I expect that it will lead to much more simpler slution.

lib/Transforms/Vectorize/SLPVectorizer.cpp
308 ↗(On Diff #166141)

Instead, I would check for the supported instructions rather than for the unsupported ones.

ABataev added inline comments.Sep 28 2018, 1:14 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
398 ↗(On Diff #166141)

Not sure if this always produces unique hashes, might lead to the incorrect compiler work.

Implemented pair<Value*, Opcode> as the key in all maps/sets.
Fixed issue with incorrect memory dependency that is attached in testcase memory-dep.ll.
Allow Non-alterative operations to be stored in InstScheduleDataMap.
Removed IsNonAlt field out of InstructionsState.

dtemirbulatov added inline comments.Oct 10 2018, 3:46 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
308 ↗(On Diff #166141)

The other list is a bit long in my implementation, this one looks better.

Update after I found another couple of errors after additional testing the change. Here are changes:
Removed OpValue field out of PseudoScheduleData.
Forbid any bundles with non-alternative operations and remainder operation, see rem-bundle.ll.
Fixed error in setInsertPointAfterBundle() function by using getScheduleData() instead of getInstrScheduleData and if a bundle member is present multiple bundles at the same time then walk through the bundle to find the last scheduled member of the bundle. see insert-after-multiple-bundle.ll
Restore MemoryDependencies to SmallVector, we don't have to count a member presents in calculateDependencies().

dtemirbulatov updated this revision to Diff 171890.EditedOct 31 2018, 4:29 AM

Implemented Map<Instruction*, std::pair<Value *Parent, unsigned Opcode> indexing for ScalarToTreeEntry, PseudoInstScheduleDataMap.
Added reorderBundles() function to reorder bundles that have common instructions and common instructions were scheduled at least twice. We don't want to note which bundle was scheduled first. We could determine instruction layout after SLP scheduling.

RKSimon added inline comments.Nov 5 2018, 9:42 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
304 ↗(On Diff #171890)

Repeated Instruction::SRem

RKSimon added inline comments.Nov 5 2018, 9:46 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1625 ↗(On Diff #171890)

@spatel @dtemirbulatov Can we use getBinOpIdentity yet ?

ABataev added inline comments.Nov 5 2018, 10:04 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1100 ↗(On Diff #171890)

Very strange that you still need particular scheduling class for the pseudo instructions, I think you can use the original data structure if you correctly implement the pseudo instruction itself. I still don't see all the changes we talked about.

308 ↗(On Diff #166141)

I still think it is better to limit this with the list of supported operations rather than the list of the unsupported ones.

spatel added inline comments.Nov 5 2018, 2:27 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1625 ↗(On Diff #171890)

Yes - if anyone has suggestions for making that 'AllowRHSConstant' param clearer, let me know.

The only problem that I see is that this code is returning +0.0 as the default constant for an fadd (because the caller guarantees 'nsz'?).
I worked around something like that here:
rL346143

I can't tell if SLP would want to do something like that.

dtemirbulatov added inline comments.Nov 7 2018, 8:11 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
304 ↗(On Diff #171890)

oh, thanks. I missed that.

1100 ↗(On Diff #171890)

ok, I am implementing now.

1625 ↗(On Diff #171890)

ok, yes, looks like it is going to work.

308 ↗(On Diff #166141)

ok, I will change that.

Looks like I fixed all previous remarks also during testing I found two more issues with the change and I fixed both.

dtemirbulatov added inline comments.Nov 14 2018, 10:22 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
4130 ↗(On Diff #174061)

We have to clear all dependancy calculations since the pseudo instruction might use already calculated SD node with calculated dependency, look at scheduling_pseudo.ll testcase.

4581 ↗(On Diff #174061)

Please check schedule-bundle1.ll testcase, without this change scheduling is not correct.

dtemirbulatov updated this revision to Diff 174063.EditedNov 14 2018, 10:30 AM

Oops, I found a few typos, Formated tryToRepresentAsInstArg() and removed the second SRem from isRemainder().

ABataev added inline comments.Nov 14 2018, 10:44 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
812 ↗(On Diff #174063)

Instead of ArrayRef<Value *> I expected to see something like ArrayRef<InstructionOrPseudoInstruction>, where InstructionOrPseudoInstruction is the class that represents the instruction/Value itself, or the pseudo instruction.

833–837 ↗(On Diff #174063)

Why do you need this?

853 ↗(On Diff #174063)

Again, you need all this code because you did not implemented what we discussed. Try to use the InstructionOrPseudoInstruction like class to represent values/instructions and pseudo-instructions. It should be much easier to implement and a lot of changes will just go away.

304 ↗(On Diff #174061)

Still 2 SRems

387 ↗(On Diff #174061)

Restore the original function here.

408 ↗(On Diff #174061)

You definitely need comments here

@dtemirbulatov Any movement on this? It'd be great to get this in for the 8.0 release!

@dtemirbulatov Any movement on this? It'd be great to get this in for the 8.0 release!

yes, I just refactored getTreeEntry and ... according to ABataev request, I will update after additional testing in a day or two.

Introduced InstructionOrPseudo structure and removed "Instruction::Invoke" out of tryToRepresentAsInstArg() function with error example in invoke.ll testcase.

lebedev.ri added inline comments.Dec 7 2018, 6:58 AM
test/Transforms/SLPVectorizer/X86/invoke.ll
3 ↗(On Diff #177207)

Unless this currently crashes, precommit?

dtemirbulatov marked an inline comment as done.Dec 7 2018, 7:06 AM
dtemirbulatov added inline comments.
test/Transforms/SLPVectorizer/X86/invoke.ll
3 ↗(On Diff #177207)

yes, thanks, I will remove this line.

dtemirbulatov marked an inline comment as done.Dec 7 2018, 7:23 AM
dtemirbulatov added inline comments.
test/Transforms/SLPVectorizer/X86/invoke.ll
3 ↗(On Diff #177207)

Oh, No crash currently on pre-commit, I will remove the whole test.

lebedev.ri added inline comments.Dec 7 2018, 7:26 AM
test/Transforms/SLPVectorizer/X86/invoke.ll
3 ↗(On Diff #177207)

What i have meant to say is, unless this test file currently causes a crash, it would be best to commit it now, thus the diff will show the diff, and not a whole new test file.

dtemirbulatov marked an inline comment as done.Dec 7 2018, 7:44 AM
dtemirbulatov added inline comments.
test/Transforms/SLPVectorizer/X86/invoke.ll
3 ↗(On Diff #177207)

Ok

ABataev added inline comments.Dec 7 2018, 8:41 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
334 ↗(On Diff #177207)

Turn this to a class. Also, not enough comments, the design of the structure also should be improved. Currently, it is too complex. Maybe base class + templates will help.

394 ↗(On Diff #177207)

Please, follow the coding standard in the new code. Also, some of the functions can be replaced by some standard functions, just like in this case it can be replaced by llvm::all_of

410 ↗(On Diff #177207)

I think, the pseudo instruction is always can be considered as the one with the same block, because you can put it easily into the current block

427 ↗(On Diff #177207)

I think this function also should accept ArrayRef<InstructionOrPseudo *> as an input param

801 ↗(On Diff #177207)

Left and Right must be SmallVectorImpl<InstructionOrPseudo *> &?

845 ↗(On Diff #177207)

Restore the original

884 ↗(On Diff #177207)

Also, seems to me this must be an InstructionOrPseudoOp

1012 ↗(On Diff #177207)

Why do you need this new function? No comments and explanation.

1076 ↗(On Diff #177207)

Pseudo instruction must be just ignored?

1188 ↗(On Diff #177207)

Again, I think it must be InstructionOrPseudoOp

1264 ↗(On Diff #177207)

Again, InstructionOrPseudoOp

1324 ↗(On Diff #177207)

If you implement InstructionOrPseudoOp correctly, this reorder stuff should not be required, I think

1345 ↗(On Diff #177207)

Why you cannot store everything in a single map?

dtemirbulatov marked an inline comment as done.Dec 28 2018, 6:29 AM
dtemirbulatov added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
1324 ↗(On Diff #177207)

well, There should be several(>=2) independent scheduling events(one for real instruction and other for pseudos) and there is just one real instruction, in the end, I don't see how it could be done without reordering or tracking the last scheduled instance for the same instruction. We could introduce something like IsLastScheduled field in ScheduleData struct, but it would be quite similar to reordering.

ABataev added inline comments.Dec 28 2018, 6:39 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1324 ↗(On Diff #177207)

If you add the real instruction instead of this pseudoinstruction, will you need all these scheduling events? No. Will you need to do some extra reordering etc.? No. Why you cannot simulate it with the new class/structure?

dtemirbulatov marked an inline comment as done.Dec 28 2018, 7:20 AM
dtemirbulatov added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
1324 ↗(On Diff #177207)

do you mean that the last one scheduling becomes the real one and we just ignore any pseudos?

dtemirbulatov marked an inline comment as done.Dec 28 2018, 7:22 AM
dtemirbulatov added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
1324 ↗(On Diff #177207)

do you mean that the last one scheduled becomes the real one and we just ignore any pseudos?

ABataev added inline comments.Dec 28 2018, 7:24 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1324 ↗(On Diff #177207)

No. I mean, that if you insert the real instructions instead of those pseudo-instructions, you won't need all that reordering/new scheduling etc. Why can't you mimic this behavior with the pseudo-instruction?

dtemirbulatov marked an inline comment as done.Dec 28 2018, 10:22 AM
dtemirbulatov added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
1324 ↗(On Diff #177207)

hmm, There are at least NextLoadStore dependancies that we break if we, for example, insert real Load instruction somewhere. Or with could recalculate NextLoadStore. Or maybe mimic pseudo in some another way.

ABataev added inline comments.Dec 28 2018, 10:24 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1324 ↗(On Diff #177207)

I don't think that they can be broken as we're not going to insert new Loads/Stores, just some binops. SO, the loads/stores and the the corresponding dependencies should not be affected.

removed bundle reordering by replacing pseudo instructions with real ones.

dtemirbulatov marked 12 inline comments as done.Jan 18 2019, 12:19 PM
dtemirbulatov added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
334 ↗(On Diff #177207)

I see this class as just one, BTW it is a container.

410 ↗(On Diff #177207)

I have this functionality, but let us for now minimize functionality change here, I will follow this change right after that check-in.

801 ↗(On Diff #177207)

That change further increases the size of this review, we can change that later.

884 ↗(On Diff #177207)

no, looks like "Instruction" here is more convenient.

This revision was not accepted when it landed; it landed in state Needs Review.Oct 7 2019, 5:32 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2019, 5:32 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
RKSimon reopened this revision.Oct 7 2019, 6:07 AM

reopening - phab seems to be a bit broken

RKSimon requested changes to this revision.Oct 7 2019, 6:07 AM
This revision now requires changes to proceed.Oct 7 2019, 6:07 AM

Reping

In process of rebasing and simplifying proposed change.

Hello, this work was brought to my attention in D102748. Would be good to get this in, and seems almost finished too. Any plans to pick this up? Otherwise I think I would be interested in doing that.

Hello, this work was brought to my attention in D102748. Would be good to get this in, and seems almost finished too. Any plans to pick this up? Otherwise I think I would be interested in doing that.

This should be much easier to implement after non-power-2 in SLP support.

Hello, this work was brought to my attention in D102748. Would be good to get this in, and seems almost finished too. Any plans to pick this up? Otherwise I think I would be interested in doing that.

This should be much easier to implement after non-power-2 in SLP support.

Could you elaborate a bit more on this just for my understanding? E.g., what is this exactly, and is this an ongoing effort in the SLP vectoriser?

Hello, this work was brought to my attention in D102748. Would be good to get this in, and seems almost finished too. Any plans to pick this up? Otherwise I think I would be interested in doing that.

This should be much easier to implement after non-power-2 in SLP support.

Could you elaborate a bit more on this just for my understanding? E.g., what is this exactly, and is this an ongoing effort in the SLP vectoriser?

Here is D57059. I'm splitting it into smaller patches and committing them step-by-step.

Thanks, interesting, having a look there!

Matt added a subscriber: Matt.May 20 2021, 8:07 AM
xbolva00 added a comment.EditedMay 22 2021, 4:08 PM

Hello, this work was brought to my attention in D102748. Would be good to get this in, and seems almost finished too. Any plans to pick this up? Otherwise I think I would be interested in doing that.

This should be much easier to implement after non-power-2 in SLP support.

Can you explain why?

x1 + 1
x2
x3 + 6
x4 + 8

How that patch will help you? You need to add “fake” noop addition “ + 0” for x2.

Hello, this work was brought to my attention in D102748. Would be good to get this in, and seems almost finished too. Any plans to pick this up? Otherwise I think I would be interested in doing that.

This should be much easier to implement after non-power-2 in SLP support.

Can you explain why?

x1 + 1
x2
x3 + 6
x4 + 8

How that patch will help you? You need to add “fake” noop addition “ + 0” for x2.

It is pretty similar to non-pow-2 functionality, just instead of undefs need to use 0 for adds, 1 for mul etc. I can't provide more details currently but I'm going to revive it after non-pow-2 patch. I assume we won't need to change scheduling model.

Ah right:) I understand now.