This is an archive of the discontinued LLVM Phabricator instance.

[ADCE][NFC] Batch DT updates together
ClosedPublic

Authored by qcolombet on Jan 4 2022, 11:22 AM.

Details

Summary

This patch delayed the updates of the dominator tree to the very end of the pass instead of doing that in small increments after each basic block.

This improves the runtime of the pass in particular in pathological cases because now the updater sees the full extend of the updates and can decide whether it is faster to apply the changes incrementally or just recompute the full tree from scratch.
Put differently, thanks to this patch, we can take advantage of the improvements that Chijun Sima <simachijun@gmail.com> made in the dominator tree updater a while ago with commit 32fd196cbf4d: "Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929)".

This change is NFC but can improve the runtime of the compiler dramatically in some pathological cases (where the pass was pushing a lot (several thousands) of small updates (less than 6)).
For instance on the motivating example we went from 300+ sec to less than a second.

Diff Detail

Event Timeline

qcolombet created this revision.Jan 4 2022, 11:22 AM
qcolombet requested review of this revision.Jan 4 2022, 11:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 4 2022, 11:22 AM
kuhar added a comment.Jan 4 2022, 11:30 AM

This change is NFC but can improve the runtime of the compiler dramatically in some pathological cases (where the pass was pushing a lot (several thousands) of small updates (less than 6)).
For instance on the motivating example we went from 300+ sec to less than a second.

Nice!

llvm/lib/Transforms/Scalar/ADCE.cpp
582

How did you pick this constant?

636

How about we move this as an early exit in DomTreeUpdater::applyUpdates?

qcolombet added inline comments.Jan 4 2022, 2:38 PM
llvm/lib/Transforms/Scalar/ADCE.cpp
582

Randomly to be honest.

It was 4 for the incremental updates and I figured we need to increase this size while batching more updates together while still hitting the fast path of the container.

636

It is, more or less, already there (this is buried in the domtreeconstruction update code), I just didn't want to assume anything on the client side, plus we save on the instantiation of the DomTreeUpdater all together.

The first iteration of the patch didn't have this check.
Put differently, I am happy to get rid of it if you believe it's not helpful.

kuhar added inline comments.Jan 4 2022, 2:48 PM
llvm/lib/Transforms/Scalar/ADCE.cpp
582

Could you either generate some statistics based on some real-world code bases and the pathological test cases mentioned in the patch description, or leave the constant out and let it SmallVector pick the small size automatically?

If you need test bitcode, I have some old whole-program bitcode laying around here: https://drive.google.com/drive/folders/1VJpym19cW-8BVgdtl2MsD3zB4CoEQ93O. You can just feed it to opt.

636

I think it's fine as-is.

qcolombet added inline comments.Jan 4 2022, 3:11 PM
llvm/lib/Transforms/Scalar/ADCE.cpp
582

Could you either generate some statistics based on some real-world code bases

I'll do that thanks for the pointer.

... the pathological test cases

The pathological test cases are less interesting because they involve thousands of edges and that's not a small vector at this point :).

qcolombet added inline comments.Jan 4 2022, 6:30 PM
llvm/lib/Transforms/Scalar/ADCE.cpp
582

Here are some statistics based on the bitcode files you've shared. We delete 733 edges total on the whole suite.
Ignoring the cases where we don't remove anything, the number of edges to delete:

  • Average at 1.5
  • Median is 1
  • The max is 36 (happens once) and the next max is 10 (happens 5 times)

All in all, if we leave the size at 4, with that suite, we hit the fast storage in ~97.3% of the cases.
If we bump it to 16 (like I did here), we hit the fast storage in ~99.9% of the cases.

kuhar added inline comments.Jan 4 2022, 7:36 PM
llvm/lib/Transforms/Scalar/ADCE.cpp
582

Thanks! I'm surprised the max is so small. <= 10 or the default size sound good to me then.

qcolombet updated this revision to Diff 397701.Jan 5 2022, 1:41 PM
  • Set the default smallvector size to 10
kuhar accepted this revision.Jan 5 2022, 1:42 PM

LGTM

This revision is now accepted and ready to land.Jan 5 2022, 1:42 PM
qcolombet added inline comments.Jan 5 2022, 1:42 PM
llvm/lib/Transforms/Scalar/ADCE.cpp
582

That was surprising to me too to be honest :).

I got these numbers by running opt with -O3.

This revision was landed with ongoing or failed builds.Jan 5 2022, 2:09 PM
This revision was automatically updated to reflect the committed changes.