Page MenuHomePhabricator

[SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops.
Needs ReviewPublic

Authored by dtemirbulatov on Jan 19 2017, 8:56 AM.

Details

Summary

Patch tries to improve vectorization of the following code:

void add1(int * __restrict dst, const int * __restrict src) {
  *dst++ = *src++;
  *dst++ = *src++ + 1;
  *dst++ = *src++ + 2;
  *dst++ = *src++ + 3;
}

Currently this code cannot be vectorized because the very first operation is not a binary add, but just a load.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Rebase. Still, I need to implement bundle reordering instead of the change in setInsertPointAfterBundle(), fully tested this change. I will split this change in several independent reviews.

Fixed one more issue with duplicating memory dependencies in calculateDependencies() by checking whether we already counted this particular dependency.

RKSimon added inline comments.Jul 31 2018, 4:07 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
519

InstructionState was keeping the Base/Alt instructions the same if AltOpcodeNum ==0 (BaseIndex ==AltIndex) - why are you inserting nulls?

1592

Can we use ConstantExpr::getBinOpIdentity instead?

dtemirbulatov added inline comments.Jul 31 2018, 5:41 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
519

yes, correct, Thanks.

1592

no, ConstantExpr::getBinOpIdentity does support only commutative operations.

lebedev.ri added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
1592

Are you sure?

Constant *ConstantExpr::getBinOpIdentity(unsigned Opcode, Type *Ty,
                                         bool AllowRHSConstant) {
...

  // Non-commutative opcodes: AllowRHSConstant must be set.
  if (!AllowRHSConstant)
    return nullptr;
dtemirbulatov added inline comments.Jul 31 2018, 6:01 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1592

oh, yes, sorry, it just a different behaviour here. we want for example 1 division and ConstantExpr::getBinOpIdentity would return 0.

Rebase, Fixed RKSimon's remark, implemented bundle reordering function.

Add hash code based indexing instead of instruction based, split ScheduleData into InstScheduleData, PseudoScheduleData, improve diagnostics of bundle scheduling.

Why you don't want to use pair<Instruction*, Opcode> as the key in all maps/sets? I expect that it will lead to much more simpler slution.

lib/Transforms/Vectorize/SLPVectorizer.cpp
274

Instead, I would check for the supported instructions rather than for the unsupported ones.

460

Not sure if this always produces unique hashes, might lead to the incorrect compiler work.

Implemented pair<Value*, Opcode> as the key in all maps/sets.
Fixed issue with incorrect memory dependency that is attached in testcase memory-dep.ll.
Allow Non-alterative operations to be stored in InstScheduleDataMap.
Removed IsNonAlt field out of InstructionsState.

dtemirbulatov added inline comments.Oct 10 2018, 3:46 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
274

The other list is a bit long in my implementation, this one looks better.

Update after I found another couple of errors after additional testing the change. Here are changes:
Removed OpValue field out of PseudoScheduleData.
Forbid any bundles with non-alternative operations and remainder operation, see rem-bundle.ll.
Fixed error in setInsertPointAfterBundle() function by using getScheduleData() instead of getInstrScheduleData and if a bundle member is present multiple bundles at the same time then walk through the bundle to find the last scheduled member of the bundle. see insert-after-multiple-bundle.ll
Restore MemoryDependencies to SmallVector, we don't have to count a member presents in calculateDependencies().

dtemirbulatov updated this revision to Diff 171890.EditedOct 31 2018, 4:29 AM

Implemented Map<Instruction*, std::pair<Value *Parent, unsigned Opcode> indexing for ScalarToTreeEntry, PseudoInstScheduleDataMap.
Added reorderBundles() function to reorder bundles that have common instructions and common instructions were scheduled at least twice. We don't want to note which bundle was scheduled first. We could determine instruction layout after SLP scheduling.

RKSimon added inline comments.Nov 5 2018, 9:42 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
270

Repeated Instruction::SRem

RKSimon added inline comments.Nov 5 2018, 9:46 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1617

@spatel @dtemirbulatov Can we use getBinOpIdentity yet ?

ABataev added inline comments.Nov 5 2018, 10:04 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
274

I still think it is better to limit this with the list of supported operations rather than the list of the unsupported ones.

1177

Very strange that you still need particular scheduling class for the pseudo instructions, I think you can use the original data structure if you correctly implement the pseudo instruction itself. I still don't see all the changes we talked about.

spatel added inline comments.Nov 5 2018, 2:27 PM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1617

Yes - if anyone has suggestions for making that 'AllowRHSConstant' param clearer, let me know.

The only problem that I see is that this code is returning +0.0 as the default constant for an fadd (because the caller guarantees 'nsz'?).
I worked around something like that here:
rL346143

I can't tell if SLP would want to do something like that.

dtemirbulatov added inline comments.Nov 7 2018, 8:11 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
270

oh, thanks. I missed that.

274

ok, I will change that.

1177

ok, I am implementing now.

1617

ok, yes, looks like it is going to work.

Looks like I fixed all previous remarks also during testing I found two more issues with the change and I fixed both.

dtemirbulatov added inline comments.Nov 14 2018, 10:22 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
4154

We have to clear all dependancy calculations since the pseudo instruction might use already calculated SD node with calculated dependency, look at scheduling_pseudo.ll testcase.

4663

Please check schedule-bundle1.ll testcase, without this change scheduling is not correct.

dtemirbulatov updated this revision to Diff 174063.EditedNov 14 2018, 10:30 AM

Oops, I found a few typos, Formated tryToRepresentAsInstArg() and removed the second SRem from isRemainder().

ABataev added inline comments.Nov 14 2018, 10:44 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
270

Still 2 SRems

396

Restore the original function here.

449

You definitely need comments here

863–864

Instead of ArrayRef<Value *> I expected to see something like ArrayRef<InstructionOrPseudoInstruction>, where InstructionOrPseudoInstruction is the class that represents the instruction/Value itself, or the pseudo instruction.

886–890

Why do you need this?

898–908

Again, you need all this code because you did not implemented what we discussed. Try to use the InstructionOrPseudoInstruction like class to represent values/instructions and pseudo-instructions. It should be much easier to implement and a lot of changes will just go away.

@dtemirbulatov Any movement on this? It'd be great to get this in for the 8.0 release!

@dtemirbulatov Any movement on this? It'd be great to get this in for the 8.0 release!

yes, I just refactored getTreeEntry and ... according to ABataev request, I will update after additional testing in a day or two.

Introduced InstructionOrPseudo structure and removed "Instruction::Invoke" out of tryToRepresentAsInstArg() function with error example in invoke.ll testcase.

lebedev.ri added inline comments.Dec 7 2018, 6:58 AM
test/Transforms/SLPVectorizer/X86/invoke.ll
3 ↗(On Diff #177207)

Unless this currently crashes, precommit?

dtemirbulatov marked an inline comment as done.Dec 7 2018, 7:06 AM
dtemirbulatov added inline comments.
test/Transforms/SLPVectorizer/X86/invoke.ll
3 ↗(On Diff #177207)

yes, thanks, I will remove this line.

dtemirbulatov marked an inline comment as done.Dec 7 2018, 7:23 AM
dtemirbulatov added inline comments.
test/Transforms/SLPVectorizer/X86/invoke.ll
3 ↗(On Diff #177207)

Oh, No crash currently on pre-commit, I will remove the whole test.

lebedev.ri added inline comments.Dec 7 2018, 7:26 AM
test/Transforms/SLPVectorizer/X86/invoke.ll
3 ↗(On Diff #177207)

What i have meant to say is, unless this test file currently causes a crash, it would be best to commit it now, thus the diff will show the diff, and not a whole new test file.

dtemirbulatov marked an inline comment as done.Dec 7 2018, 7:44 AM
dtemirbulatov added inline comments.
test/Transforms/SLPVectorizer/X86/invoke.ll
3 ↗(On Diff #177207)

Ok

ABataev added inline comments.Dec 7 2018, 8:41 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
339–356

Turn this to a class. Also, not enough comments, the design of the structure also should be improved. Currently, it is too complex. Maybe base class + templates will help.

407

Please, follow the coding standard in the new code. Also, some of the functions can be replaced by some standard functions, just like in this case it can be replaced by llvm::all_of

423

I think, the pseudo instruction is always can be considered as the one with the same block, because you can put it easily into the current block

438

I think this function also should accept ArrayRef<InstructionOrPseudo *> as an input param

814

Left and Right must be SmallVectorImpl<InstructionOrPseudo *> &?

860

Restore the original

898–908

Also, seems to me this must be an InstructionOrPseudoOp

1042

Why do you need this new function? No comments and explanation.

1087

Pseudo instruction must be just ignored?

1189–1190

Again, I think it must be InstructionOrPseudoOp

1260–1261

Again, InstructionOrPseudoOp

1308

If you implement InstructionOrPseudoOp correctly, this reorder stuff should not be required, I think

1321–1322

Why you cannot store everything in a single map?

dtemirbulatov marked an inline comment as done.Dec 28 2018, 6:29 AM
dtemirbulatov added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
1308

well, There should be several(>=2) independent scheduling events(one for real instruction and other for pseudos) and there is just one real instruction, in the end, I don't see how it could be done without reordering or tracking the last scheduled instance for the same instruction. We could introduce something like IsLastScheduled field in ScheduleData struct, but it would be quite similar to reordering.

ABataev added inline comments.Dec 28 2018, 6:39 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1308

If you add the real instruction instead of this pseudoinstruction, will you need all these scheduling events? No. Will you need to do some extra reordering etc.? No. Why you cannot simulate it with the new class/structure?

dtemirbulatov marked an inline comment as done.Dec 28 2018, 7:20 AM
dtemirbulatov added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
1308

do you mean that the last one scheduling becomes the real one and we just ignore any pseudos?

dtemirbulatov marked an inline comment as done.Dec 28 2018, 7:22 AM
dtemirbulatov added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
1308

do you mean that the last one scheduled becomes the real one and we just ignore any pseudos?

ABataev added inline comments.Dec 28 2018, 7:24 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1308

No. I mean, that if you insert the real instructions instead of those pseudo-instructions, you won't need all that reordering/new scheduling etc. Why can't you mimic this behavior with the pseudo-instruction?

dtemirbulatov marked an inline comment as done.Dec 28 2018, 10:22 AM
dtemirbulatov added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
1308

hmm, There are at least NextLoadStore dependancies that we break if we, for example, insert real Load instruction somewhere. Or with could recalculate NextLoadStore. Or maybe mimic pseudo in some another way.

ABataev added inline comments.Dec 28 2018, 10:24 AM
lib/Transforms/Vectorize/SLPVectorizer.cpp
1308

I don't think that they can be broken as we're not going to insert new Loads/Stores, just some binops. SO, the loads/stores and the the corresponding dependencies should not be affected.

removed bundle reordering by replacing pseudo instructions with real ones.

dtemirbulatov marked 12 inline comments as done.Jan 18 2019, 12:19 PM
dtemirbulatov added inline comments.
lib/Transforms/Vectorize/SLPVectorizer.cpp
339–356

I see this class as just one, BTW it is a container.

423

I have this functionality, but let us for now minimize functionality change here, I will follow this change right after that check-in.

814

That change further increases the size of this review, we can change that later.

898–908

no, looks like "Instruction" here is more convenient.