This is an archive of the discontinued LLVM Phabricator instance.

[GISel][KnownBits]{NFC} Add a cache mechanism to speed compile time
ClosedPublic

Authored by qcolombet on Feb 20 2020, 6:12 PM.

Details

Summary

This patch adds a cache that is valid only for the duration of a call
to getKnownBits. With such short lived cache we avoid all the problems
of cache invalidation while still getting the benefits of reusing
the information we already computed.

This cache is useful whenever an instruction occurs more than once
in a chain of computation.
E.g.,
v0 = G_ADD v1, v2
v3 = G_ADD v0, v1

Previously we would compute the known bits for:
v1, v2, v0, then v1 again and finally v3.

With the patch, now we won't have to recompute v1 again.

NFC

Diff Detail

Event Timeline

qcolombet created this revision.Feb 20 2020, 6:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 20 2020, 6:12 PM
qcolombet retitled this revision from [GISel][KnownBits] Add a cache mechanism to speed compile time to [GISel][KnownBits]{NFC} Add a cache mechanism to speed compile time.Feb 20 2020, 6:23 PM
qcolombet edited the summary of this revision. (Show Details)
arsenm added inline comments.Feb 20 2020, 6:32 PM
llvm/include/llvm/CodeGen/GlobalISel/GISelKnownBits.h
37

This assumes an instruction with a single def

llvm/lib/CodeGen/GlobalISel/GISelKnownBits.cpp
119

This should at least skip for instructions with multiple defs if not handle them

qcolombet marked an inline comment as done.

Switch the caching to register instead of instruction.

qcolombet marked 2 inline comments as done.Feb 21 2020, 10:15 AM
qcolombet added inline comments.
llvm/include/llvm/CodeGen/GlobalISel/GISelKnownBits.h
37

Good point. I'll do the mapping with a Register instead.

dsanders accepted this revision.Feb 21 2020, 11:22 AM

LGTM

llvm/include/llvm/CodeGen/GlobalISel/GISelKnownBits.h
37

Thanks. My long term plan for the intra/inter-pass cache is based about caching based on the register too. The thinking being that passes can't cause known bits of a vreg to become unknown or change without also breaking their contracts (you might not be able to tell that a known bit is still known but it still is). Legalize/Combine/ISel/etc. are all required to emit functionally equivalent replacements. They'll sometimes make undefined bits known so the stale cache issue still matters for optimization but not for correctness.

This revision is now accepted and ready to land.Feb 21 2020, 11:22 AM
qcolombet marked an inline comment as done.Feb 21 2020, 11:31 AM

My long term plan for the intra/inter-pass cache is based about caching based on the register too.

That seems fragile to do that generally. I agree that is less likely that a register changes, but it is not programmatically impossible. E.g., we could imagine that register numbers get recycle and then we would hit a cache entry that was not intended for a register.

Bottom line, whatever caching we do, it will have to be bound to some kind of observer in my opinion.

My long term plan for the intra/inter-pass cache is based about caching based on the register too.

That seems fragile to do that generally. I agree that is less likely that a register changes, but it is not programmatically impossible. E.g., we could imagine that register numbers get recycle and then we would hit a cache entry that was not intended for a register.

Bottom line, whatever caching we do, it will have to be bound to some kind of observer in my opinion.

We don't recycle vreg numbers today and I'm not aware of any plans to do so. There's also no API to delete a vreg. It's technically possible to change the type but that's potentially dangerous.

At the moment, everyone calls createVirtualRegister() when they need one that didn't exist before and the pre-existing defs at worst become unused by some/all of their uses. A vreg def is never replaced by the def of an instruction that isn't computing (or finishing computing) the same value.

My long term plan for the intra/inter-pass cache is based about caching based on the register too.

That seems fragile to do that generally. I agree that is less likely that a register changes, but it is not programmatically impossible. E.g., we could imagine that register numbers get recycle and then we would hit a cache entry that was not intended for a register.

Bottom line, whatever caching we do, it will have to be bound to some kind of observer in my opinion.

We don't recycle vreg numbers today and I'm not aware of any plans to do so. There's also no API to delete a vreg. It's technically possible to change the type but that's potentially dangerous.

At the moment, everyone calls createVirtualRegister() when they need one that didn't exist before and the pre-existing defs at worst become unused by some/all of their uses. A vreg def is never replaced by the def of an instruction that isn't computing (or finishing computing) the same value.

You could swap two unrelated registers with the asme type by uisng replaceRegWith, though I don't know why you would ever do that

My long term plan for the intra/inter-pass cache is based about caching based on the register too.

That seems fragile to do that generally. I agree that is less likely that a register changes, but it is not programmatically impossible. E.g., we could imagine that register numbers get recycle and then we would hit a cache entry that was not intended for a register.

Bottom line, whatever caching we do, it will have to be bound to some kind of observer in my opinion.

We don't recycle vreg numbers today and I'm not aware of any plans to do so. There's also no API to delete a vreg. It's technically possible to change the type but that's potentially dangerous.

At the moment, everyone calls createVirtualRegister() when they need one that didn't exist before and the pre-existing defs at worst become unused by some/all of their uses. A vreg def is never replaced by the def of an instruction that isn't computing (or finishing computing) the same value.

You could swap two unrelated registers with the asme type by uisng replaceRegWith, though I don't know why you would ever do that

That's true (although you'd need a third vreg with no defs/uses to do it) but as you say there's no reason to do so.

That's true (although you'd need a third vreg with no defs/uses to do it) but as you say there's no reason to do so.

That's my point. Even if there is no reason to do so, it is possible.
What I am saying is a mechanism that solely relies on registers not changing value is not bug free.

aemerson added inline comments.Feb 21 2020, 12:33 PM
llvm/include/llvm/CodeGen/GlobalISel/GISelKnownBits.h
37

Would it be beneficial to use a SmallDenseMap here?

That's true (although you'd need a third vreg with no defs/uses to do it) but as you say there's no reason to do so.

That's my point. Even if there is no reason to do so, it is possible.
What I am saying is a mechanism that solely relies on registers not changing value is not bug free.

Just because you said 'registers' there I should clarify that it relies on vregs not changing value (which they don't). Physical registers are different matter and those won't be cachable between getKnownBits() calls.

I think the argument that because it's possible to do the wrong thing if you try hard enough that you have to protect against that is rather flawed and inconsistent with the existing code in LLVM. For example, it's possible to create vregs with multiple defs (and quite easy to do by accident) but we don't guard against that in release/debug builds because it's a breach of the SSA contract. We do of course guard against it when the machine verifier is enabled so we find out if somebody violates the contract. That same trust but verify with additional tools approach is appropriate here too. A related issue for that is the ability to serialize analysis passes in MIR so we can run both with a freshly run analysis and maintained data from a previous pass.

If a pass really insists on doing something weird that breaks the contract there will be ways to flush the cache (both fine grain, and the entire thing) just like any other analysis pass but the code breaking the contract will be responsible for repairing the analysis even if it's as simple as telling the pass manager to re-run the analysis (or more accurately not telling the pass manager that the pass preserves it).

I think the argument that because it's possible to do the wrong thing if you try hard enough that you have to protect against that is rather flawed and inconsistent with the existing code in LLVM.

I disagree. We never said or enforced that virtual registers are not reused (BTW we do that all the time after SSA, for instance in the register coalescer, but let us stick to SSA passes.)
Your example with SSA is different: this is a well defined, checked and enforced property, unlike the virtual register property you want.

Even if we were to decide that yes, virtual registers must not change values overtime (while in SSA form), this is not something we could check or enforce, therefore we open ourselves to hard to track bugs. To me that is a design flaw.

qcolombet marked an inline comment as done.Feb 21 2020, 1:19 PM
qcolombet added inline comments.
llvm/include/llvm/CodeGen/GlobalISel/GISelKnownBits.h
37

That certainly wouldn't hurt :).

Thanks for pointing it out.

qcolombet updated this revision to Diff 245987.Feb 21 2020, 2:01 PM
  • Use SmallDenseMap instead of DenseMap
qcolombet marked an inline comment as done.Feb 21 2020, 2:01 PM

I've spoken to Quentin offline and it seems we're not in as much disagreement as it appears. The main issues are:

  1. Non-SSA passes aren't included in the plan (I'm focused on IRTranslator to InstructionSelector which are all SSA)
  2. Detecting when the contract is violated
  3. Not having to write maintenance code

I'll give some more thought to those before I (eventually) make the proposal. I think I have 3. covered, 2. is broadly ok but certainly has room for improvement. For 1. I suggested an off-the-cuff solution that abstracted the cache key to allow implementations to compute a value number by some means instead of directly using the vreg number but it needs some proper thought.

Quentin also mentioned that we should be able to take information from the frontend and I agree with that.

@qcolombet: Does that match your view of our conversation?

I think the argument that because it's possible to do the wrong thing if you try hard enough that you have to protect against that is rather flawed and inconsistent with the existing code in LLVM.

I disagree. We never said or enforced that virtual registers are not reused

It's true that we've never said or enforced that. However, for Legalizer, RegBankSelect, Combine, and InstructionSelect it's what's been done in practice as a result of what's needed to achieve correctness for these passes. I do agree that we need to be able to catch when somebody did something weird here. That said, personally I don't feel we need to account for things like pointlessly shuffling vreg numbers. If we can catch that too with a perfect verifier then that's great and makes us very confident but I feel the required scope of the verifier (particularly in the context of a partial known-bits implementation) should be programming mistakes rather than things that would only arise due to intentional attacks on the algorithm.

Part of the eventual proposal will be that so long as you follow that additional constraint, you will get maintenance free preservation of the known bits analysis. If you don't follow that additional constraint then you must either not preserve the analysis, purge affected vregs from the cache, or repair the information. Effectively you opt in to a requirement to either follow an additional constraint, or repair/purge data that didn't follow it. The default is for known-bits to own the maintenance burden which would be done in a very conservative manner, being limited to caching within the getKnownBits() call but not between them and that the pass manager destroys any data left after the pass.

Your example with SSA is different: this is a well defined, checked and enforced property, unlike the virtual register property you want.

Even if we were to decide that yes, virtual registers must not change values overtime (while in SSA form), this is not something we could check or enforce, therefore we open ourselves to hard to track bugs. To me that is a design flaw.

The verifier I had in mind can check and enforce direct contradictions (known bits flipping to the other value, or simultaneously claiming to be both values). It's not able to check things the analysis has newly learned (unknown bits becoming known) through new code being more analyzable by being implemented or by being within the search depth. Similarly it can't check things that the analysis has forgotten (known bits becoming unknown) through an incomplete implementation or being outside the search depth. The search depth issue is less of a problem in the context of caching though as the cache permits a much larger search as information can transfer from one query to another even if different instructions are queried.

In addition to the verifier, I want to be able to use analysis serialization to detect behaviour differences arising from a preserved analysis compared to a fresh one. These differences don't necessarily mean that there's a bug, it merely allows us to test the preserved case which isn't possible with -stop-before/-start-before/-run-pass today

qcolombet closed this revision.Feb 21 2020, 3:01 PM

To https://github.com/llvm/llvm-project.git

9708279c725..618dec2aeff  master -> master

Does that match your view of our conversation?

Correct!

Thanks for writing a summary up.