Page MenuHomePhabricator

Add an optional cache to computeKnownBits.
Needs ReviewPublic

Authored by jlebar on Sep 15 2022, 8:02 PM.

Details

Summary

BasicAA (via MemoryDependenceResults, via GVN) may make many calls to
computeKnownBits. On my testcase (not able to share, sorry), this cache
reduces the time spent in computeKnownBits from 2s to 270ms, a 7.5x
speedup.

Diff Detail

Event Timeline

jlebar created this revision.Sep 15 2022, 8:02 PM
jlebar requested review of this revision.Sep 15 2022, 8:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2022, 8:02 PM
jlebar added a subscriber: mkuper.
nikic added a subscriber: nikic.
nikic added inline comments.
llvm/include/llvm/Analysis/AliasAnalysis.h
41

Incorrect include paths

llvm/include/llvm/Analysis/ValueTracking.h
52

This is incorrect, the context instruction is used to check applicability of assumes for example, see isValidAssumeForContext().

foad added a comment.Sep 16 2022, 1:55 AM

from 2 to 270ms

Makes it sound like a 135x slow down.

foad added inline comments.Sep 16 2022, 1:57 AM
llvm/include/llvm/Analysis/ValueTracking.h
49

If you want to reformat this file can you do it first as a separate NFC patch please?

jlebar added inline comments.Sep 16 2022, 10:41 AM
llvm/include/llvm/Analysis/ValueTracking.h
52

isGuaranteedToTransferExecutionToSuccessor, wow, TIL. Thanks.

But, very sad: With the change to look at the real CtxI rather than its BB, this no longer provides a significant speedup.

This all seems so silly, because in 99.9% of cases the BB terminator really *is* sufficient.

Any thoughts? Or do you think I should give up on this?

jlebar edited the summary of this revision. (Show Details)Sep 16 2022, 10:41 AM
jlebar added inline comments.Sep 16 2022, 10:48 AM
llvm/include/llvm/Analysis/ValueTracking.h
49

I didn't really *want* to reformat the file, but arc was doing it for me because I touched these lines... :-/

Anyway, I pushed 8cc3bfd13f3135985e5b15ee65f2fc43239fb9fe.

nikic added a comment.Sep 21 2022, 1:08 AM

BasicAA (via MemoryDependenceResults, via GVN) may make many calls to
computeKnownBits. On my testcase (not able to share, sorry), this cache
reduces the time spent in computeKnownBits from 2s to 270ms, a 7.5x
speedup.

MDA has a known problem with overly large recursion cutoffs. I'd be interested to know whether passing -memdep-block-number-limit=100 would improve compile-time for your case.

llvm/include/llvm/Analysis/ValueTracking.h
52

I don't have any great ideas on how to do more effective caching without affecting result quality. I guess we could decouple the context-independent from the context-dependent parts somehow, but given that they are currently interleaved, this doesn't sound easy.

MDA has a known problem with overly large recursion cutoffs. I'd be interested to know whether passing -memdep-block-number-limit=100 would improve compile-time for your case.

Turns out that tweaking the memdep limits did help in my case.

So I guess I'm happy dropping this patch. Though I *do* feel like there's still something here.