Page MenuHomePhabricator

[SLP] Optimize reschedule of previously scheduled bundle member [NFC]
AbandonedPublic

Authored by reames on Jan 22 2022, 8:17 AM.

Details

Summary

The case being handled is when we are trying to form a bundle from a set of instructions where at least one is a transitive use of another existing tree node (i.e. has already been scheduled).

For context, the effect of bundling is effectively to add an extra scheduling dependency that all instructions within the bundle can be scheduled together. This means that all nodes making up the bundle will be scheduled at the point that *all* instructions making up the bundle are ready. This dependency is implicit and not explicitly materialized in the dependency sets.

The old code handled this by blowing away the entire schedule and all dependency information. This patch follows from the observations that a) bundling doesn't change (explicitly tracked) dependencies at all, and b) since we have a strictly bottom up scheduler the scheduling of users of the new bundle is not effected by the exact placement of the bundle instructions.

As such, we can parallel the upwards def walk that schedule does to remove dependencies, and unschedule only nodes reachable through the upwards def chain.

(For anyone aware of my MSSA patch, this change is completely unrelated - other than it cleans up some of the same code.)

Diff Detail

Event Timeline

reames created this revision.Jan 22 2022, 8:17 AM
reames requested review of this revision.Jan 22 2022, 8:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 22 2022, 8:17 AM
ABataev added inline comments.Jan 24 2022, 12:17 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2596–2597

Use /// style of comments here

2599

Asserting message?

2602–2603

SmallVector<ScheduleData *> Worklist(1, SD);

2613

during

2627

Can we get PHINode here? Plus, extracts and PHIs are completely excluded from scheduling, I assume you can skip such instructions. Same about insertelements

2644

Asserting message.

2657

Asserting message.

2662–2663

Extra breaks

reames added inline comments.Jan 27 2022, 1:46 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2599

Doesn't seem to add any value and we are inconsistent about them across codebase.

I really don't care here, if you want it added, please give me wording and I'll copy-and-paste.

2602–2603

Not idiomatic, and frankly, confusing.

2613

Comment copied from existing code. I will fix both in a followon commit.

2627

This is a good question to which I didn't know the answer.

From digging, it looks like we can't directly reach this code with a phi. Reason being that scheduling a phi bundle will always fail without trying to extend.

On the other hand, the phi could be an operand to another bundle which is scheduled, and then unscheduled. But, and I think this is the key bit, the phis should never be in the resulting scheduling window. As such, they can't ever be marked "scheduled" and we'll push them to the worklist, then ignore them.

As such, the code appears to be correct as written, if admittedly, accidentally.

reames updated this revision to Diff 403791.Jan 27 2022, 1:47 PM

Address minor style comments.

ABataev added inline comments.Jan 27 2022, 1:56 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2599

We try to add messages in a new/modified code.

2602–2603

We use this style of worklist initialization at least in SLP vectorizer. Plus, the code requires formatting.

reames added inline comments.Jan 27 2022, 2:02 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2599

This is not in the programming manual. I checked.

reames updated this revision to Diff 403798.Jan 27 2022, 2:14 PM

Address nitpicks.

reames added inline comments.Jan 31 2022, 12:07 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2599

Though, I apparently didn't check hard enough. It is in the style guide. Hm, wonder when that changed?

Currently this patch crashes the compiler for SPECCPU2006:

0.      Program arguments: clang++ -DNDEBUG -march=native -O3 -fomit-frame-pointer -flto -DNDEBUG -w -Werror=date-time -save-stats=obj -DSPEC_CPU -DSPEC_CPU_LINUX -DSPEC_CPU_X64 -DSPEC_CPU_LITTLEENDIAN -DSPEC_CPU_LITTLE_ENDIAN -DSPEC_CPU_LP64 -save-stats=obj -MD -MT External/SPEC/CFP2006/444.namd/CMakeFiles/444.namd.dir/Spec2006/benchspec/CPU2006/444.namd/src/ComputeNonbondedUtil.C.o -MF External/SPEC/CFP2006/444.namd/CMakeFiles/444.namd.dir/Spec2006/benchspec/CPU2006/444.namd/src/ComputeNonbondedUtil.C.o.d -o External/SPEC/CFP2006/444.namd/CMakeFiles/444.namd.dir/Spec2006/benchspec/CPU2006/444.namd/src/ComputeNonbondedUtil.C.o -c Spec2006/benchspec/CPU2006/444.namd/src/ComputeNonbondedUtil.C
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x00000000066b57e3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (clang+++0x66b57e3)
 #1 0x00000000066b345e llvm::sys::RunSignalHandlers() (clang+++0x66b345e)
 #2 0x00000000066b4b0d llvm::sys::CleanupOnSignal(unsigned long) (clang+++0x66b4b0d)
 #3 0x0000000006612c43 (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) CrashRecoveryContext.cpp:0:0
 #4 0x0000000006612e1e CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #5 0x00007fe8e1199b20 __restore_rt sigaction.c:0:0
 #6 0x00007fe8dfc7a37f raise (/lib64/libc.so.6+0x3737f)
 #7 0x00007fe8dfc64db5 abort (/lib64/libc.so.6+0x21db5)
 #8 0x00007fe8dfc64c89 _nl_load_domain.cold.0 loadmsgcat.c:0:0
 #9 0x00007fe8dfc72a76 .annobin___GI___assert_fail.end assert.c:0:0
#10 0x00000000068ee7c4 llvm::slpvectorizer::BoUpSLP::scheduleBlock(llvm::slpvectorizer::BoUpSLP::BlockScheduling*) (clang+++0x68ee7c4)
#11 0x00000000068ec5bc llvm::slpvectorizer::BoUpSLP::vectorizeTree(llvm::MapVector<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>, llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::D
enseMapPair<llvm::Value*, unsigned int> >, std::vector<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u> >, std::allocator<std::pair<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u> > > > >&) (/clang+++0x68ec5bc)
...
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2613

during

reames added a comment.Feb 3 2022, 9:10 AM

Currently this patch crashes the compiler for SPECCPU2006:

I don't have a copy of SPEC2006, can you provide a preprocessed source file or IR file?  The stack trace is not enough to guess the cause.

Currently this patch crashes the compiler for SPECCPU2006:

I don't have a copy of SPEC2006, can you provide a preprocessed source file or IR file?  The stack trace is not enough to guess the cause.

Will try to prepare a reproducer today.

Currently this patch crashes the compiler for SPECCPU2006:

I don't have a copy of SPEC2006, can you provide a preprocessed source file or IR file?  The stack trace is not enough to guess the cause.

Will try to prepare a reproducer today.

Thank you!

Currently this patch crashes the compiler for SPECCPU2006:

I don't have a copy of SPEC2006, can you provide a preprocessed source file or IR file?  The stack trace is not enough to guess the cause.
define void @test() {
for.end580:
  br i1 false, label %for.body602.preheader, label %for.cond794.preheader

for.body602.preheader:                            ; preds = %for.end580
  br label %for.body602

for.cond794.preheader:                            ; preds = %for.body602, %for.end580
  %fullElectEnergy.1.lcssa = phi double [ 0.000000e+00, %for.end580 ], [ %10, %for.body602 ]
  %electEnergy.1.lcssa = phi double [ 0.000000e+00, %for.end580 ], [ %3, %for.body602 ]
  ret void

for.body602:                                      ; preds = %for.body602, %for.body602.preheader
  %mul701 = fmul double 0.000000e+00, 0.000000e+00
  %mul703 = fmul double 0.000000e+00, 0.000000e+00
  %0 = call double @llvm.fmuladd.f64(double 0.000000e+00, double 0.000000e+00, double %mul701)
  %1 = call double @llvm.fmuladd.f64(double %0, double 0.000000e+00, double %mul703)
  %2 = call double @llvm.fmuladd.f64(double %1, double 0.000000e+00, double 0.000000e+00)
  %3 = call double @llvm.fmuladd.f64(double 0.000000e+00, double %2, double 0.000000e+00)
  %4 = call double @llvm.fmuladd.f64(double %mul701, double 0.000000e+00, double %mul703)
  store double %4, double* null, align 8
  %5 = load double, double* null, align 8
  %6 = load double, double* null, align 8
  %mul746 = fmul double 0.000000e+00, %6
  %mul747 = fmul double 0.000000e+00, %5
  %7 = call double @llvm.fmuladd.f64(double 0.000000e+00, double 0.000000e+00, double %mul746)
  %8 = call double @llvm.fmuladd.f64(double %7, double 0.000000e+00, double %mul747)
  %9 = call double @llvm.fmuladd.f64(double %8, double 0.000000e+00, double 0.000000e+00)
  %10 = call double @llvm.fmuladd.f64(double 0.000000e+00, double %9, double 0.000000e+00)
  br i1 false, label %for.cond794.preheader, label %for.body602
}

declare double @llvm.fmuladd.f64(double, double, double)
opt -slp-vectorizer -S -o -  ./repro.ll

Ok, I see the problem. I'd forgotten to reincrement the dependency counter when unscheduling. I'm amazed this worked at all.

reames abandoned this revision.Feb 4 2022, 1:14 PM

Once I dug into the failure reported, I realized that doing this correctly turned out to be much more complicated than I'd first thought. Given I don't have a strong motivation for this other than a minor performance win which doesn't even improve the example I'm currently targeting, I'm going to set this aside for now and return to it if warranted in the future.