This is an archive of the discontinued LLVM Phabricator instance.

[GVN] Support address translation through select instructions
Changes PlannedPublic

Authored by kachkov98 on Jan 27 2023, 4:54 AM.

Details

Summary

Process cases when phi incoming in predecessor block has select
instruction, and this select address is unavailable, but there
are addresses translated from both sides of select instruction.

Motivation
Consider the following PRE cases: https://godbolt.org/z/Mj6nb8qvs
In first test, latest load is partially redundant: both variants (A[i] and A[i+1]) are already available if i < N.
In second test, load of A[res] is redundant: both variants are availabe from previous iteration (A[res]) or from previous A[i] load. The only path where its unavailable is on first iteration from preheader block.

Both these cases are optimized properly by GCC.

Proposed fix in this patch is to support address translation through select instruction: if we had translation failure, but phi incoming is select instruction, we attempt to translate both sides of this selects and find loads from this pair of addresses. This can be treated as an extension of https://reviews.llvm.org/D118143.

Fixes https://github.com/llvm/llvm-project/issues/58569.

Diff Detail

Event Timeline

kachkov98 created this revision.Jan 27 2023, 4:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2023, 4:54 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
kachkov98 requested review of this revision.Jan 27 2023, 4:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2023, 4:54 AM
kachkov98 edited the summary of this revision. (Show Details)Jan 27 2023, 5:02 AM
kachkov98 added reviewers: nikic, fhahn, reames, mkazantsev.
kachkov98 edited the summary of this revision. (Show Details)Jan 27 2023, 5:03 AM
mkazantsev requested changes to this revision.Jan 29 2023, 11:53 PM
mkazantsev added inline comments.
llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h
242

Name convention, pls pre-commit NFC to fix it first.

llvm/include/llvm/Analysis/PHITransAddr.h
31

Formatting?

100

Is it possible that two instructions in InstInputs are in the same block? If no, you can else return nullptr; here.

If it is possible, you have a bug, because two of them can be selects, and what you return is kind of random?

In either case, you need some assertions to ensure prerequisites you are relying on here.

125

Three slashes.
Use \p when addressing parameters.

128

name conventions here are widely broken, functions are supposed to be in camelCase.

130

const DominatorTree &DT or it can be null?

149–154

Any explanation what are Sel and Cond?

llvm/lib/Analysis/PHITransAddr.cpp
160

Check on Sel != nullptr is not needed because V is never null. It cannot be equal to Sel = null.

161

Why did't you just pass Cond ? Sel->getTrueValue() : Sel->getFalseValue() as parameter? Cond is not used for anything else, or I'm missing something?

294–295

stray change

336

Why not

Value *TrueAddr = PHITransAddr(*this).PHITranslateSubExpr(Addr, CurBB, PredBB,
                                                          DT, Sel->getTrueValue());
Value *FalseAddr = PHITransAddr(*this).PHITranslateSubExpr(
    Addr, CurBB, PredBB, DT, Sel->getFalseValue());

?

llvm/lib/Transforms/Scalar/GVN.cpp
1266

Make it separate asserts. If it fails, not clear what exactly. And also add them some description.

This revision now requires changes to proceed.Jan 29 2023, 11:53 PM
mkazantsev added inline comments.Jan 29 2023, 11:54 PM
llvm/lib/Analysis/PHITransAddr.cpp
336

Also, is PHITransAddr(*this) twice necessary?

nikic added inline comments.Feb 1 2023, 12:17 PM
llvm/include/llvm/Analysis/PHITransAddr.h
44

Unless std::variant is smarter than I think, you probably want to use a pair of Values instead, and determine the variant based on whether the second is nullptr. Otherwise you'll store a redundant type tag.

Since naming conventions are indeed broken across many PHITransAddr methods, maybe it's worth to clean up them first.

Patches:
https://reviews.llvm.org/D143166
https://reviews.llvm.org/D143167

kachkov98 added inline comments.Feb 2 2023, 4:09 AM
llvm/include/llvm/Analysis/PHITransAddr.h
44

Thank you for the idea! Now we are checking that both select addresses != nullptr a bit late, but after moving this check earlier this can be implemented without tag.

100

In theory it's possible, and we can do smarter things in such cases. For example, if we have multiple select inputs with same Cond value, address still can be translated for Cond = true and Cond = false, and for that reason I'm passing Cond value and bool flag instead of select true or false value directly (like for translation of PHI nodes: we are passing CurBB->PredBB edge, not phi incoming value directly). However, on my experiments with SPEC2017, there was no cases with multiple select inputs, so maybe we could simplify this without any practical loss.

kachkov98 updated this revision to Diff 495088.Feb 6 2023, 4:49 AM

Address review comments

kachkov98 marked 8 inline comments as done and 2 inline comments as done.Feb 6 2023, 4:58 AM
kachkov98 added inline comments.
llvm/include/llvm/Analysis/PHITransAddr.h
44

New implementation is using just pair of Values.

130

Just like with translateValue(BasicBlock *, BasicBlock *, const DominatorTree *, bool) overload, DominatorTree is optional parameter. It looks like we always pass it though, but leaving it as pointer gives more flexibility.

llvm/lib/Analysis/PHITransAddr.cpp
161

Refactored this part: now we pass select condition and its value (true or false), so we can translate address even if it depends from multiple selects with same condition.

336

Unfortunately, it's required: PHITransSubExpr modifies InstInputs during its work, so we can't run this method twice for the same object.

mkazantsev added inline comments.Feb 6 2023, 10:24 AM
llvm/include/llvm/Analysis/PHITransAddr.h
47

cast already contains this assert.

49

Can we do it in constructor? Smth like

SelectAddr(Value *Addr) {
  if (auto *SI = dyn_cast<SelectInst>(Addr)) {
    V1 = SI->getTrueValue();
    V2 = SI->getFalseValue();
  } else {
    V1 = Addr;
    V2 = nullptr;
  }
}

and just return {V1, V2} here? I don't know if it's possible to make many queries here, but if yes, it'll definitely be faster.

In that case, you can also replace

Value *V1, *V2;

with

SelectAddrs  V;

and just return it.

100

With basic block check is now gone, my question still holds. What if there are many selects among InstInputs? This method's comment doesn't say anything about which condition it should return. Is there a guarantee that it'll be the first one, or it could theoretically return any of them? May this change or not? Is it important for the algorithm?

From how you're using it, it seems that what you *really* want is to return all conditions of all involved selects (in vector or set, I don't know if order matters). I'm completely fine if it's not done in this patch, but the specification of this method isn't clear about this situation, and I'm not sure this method is doing what intended.

llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
1041

I don't understand what this code is doing. Imagine you meet smth like

select i1 %cond, i32 undef, i32 undef

which is also dead. The search will stop and return this instruction (?). And if it doesn't exist, it will maybe return nullopt. What was the intention here?

kachkov98 marked an inline comment as not done.Feb 7 2023, 2:10 AM

I want to put more context for this change that may cover some questions regarding the algorithm and its limitations. Let's consider the following simple case:

  BB1
  / |
BB2 |
  \ |
  BB3

Where BB3 contains load from address P, that translates to address P1 for BB1->BB3 edge and to address P2 for BB2->BB3 edge. For simplicity, let's assume there are no any clobbers in between. If BB1 has Defs for P1 and P2 (e.g. loads from these pointers), this is a case of full redundancy: we can insert a phi in BB3 (P1 Def for BB1->BB3 and P2 Def for BB2->BB3). However, current design of MemoryDependenceAnalysis can't handle it as explained in this comment. The reason is following: MemoryDependenceAnalysis goes back from BB3 block, trying to find dependency of load P. Moving from BB3 to BB2, it translates address to P2, moving from BB2 to BB1 address doesn't change, and in BB1 it detects Def of P2. Then it starts to explore other paths from BB3 (basically it's a backward DFS), and goes from BB3 to BB1, translating address to P1. After that we have a collision: BB1 was already visited with P2 address, and since addresses doesn't match, it bails out. To process this case, it should be able to propagate back multiple addresses at once: in BB1, address of load P may be actually load from P1 or P2, so now we should find both variants. However, we need to be careful here: number of such variants can grow exponentially after each basic block where paths with different addresses are met.

On practice, the important special case of previous example is select instruction: If BB2 is empty, SimplifyCFG transforms it to select. It has the same problem: we need to propagate back 2 addresses at once, and number of variants can also grow exponentially, e.g.:

%a = select %c1, %i1, %i2
%b = select %c2, %i3, %i4
%sum = add %a, %b
%P = GEP %ptr, %sum

(Here we have 4 variants of %P).
So the first made choice is to translate address only for one select condition, so we will always have only 2 addrs to propagate back. The next question is how to choose this condition: in theory, we can sort out all select conditions and try to find "best" one, but it looks like too much complication wthout any real improvements: on practice load addresses depends on one select at most, so this implementation just picks up the first found select. Current GVN implementation already has support of some cases where load address depends from select directly (it was refactored in https://reviews.llvm.org/D141619), and this patch tries to extend it for cases where address depends from select through other instructions (GEPs and casts). For this approach MemoryDependenceAnalysis still needs to report Def dependency from Select instruction, and this dependency instruction will be used to rematerialize available value in GVN (we need only its condition and position to insert materialized value, so reporting select i1 %cond, i32 undef, i32 undef is ok: we don't care about its true and false values, addresses for %cond = true and %cond = false will be passed separately).

(we need only its condition and position to insert materialized value, so reporting select i1 %cond, i32 undef, i32 undef is ok: we don't care about its true and false values, addresses for %cond = true and %cond = false will be passed separately).

Then why can't you just always insert it on this block's terminator? Why iterate through its instructions and try to find this garbage?

kachkov98 updated this revision to Diff 496424.Feb 10 2023, 4:15 AM

Insert select before predecessor's terminator to simplify clobbers search

(we need only its condition and position to insert materialized value, so reporting select i1 %cond, i32 undef, i32 undef is ok: we don't care about its true and false values, addresses for %cond = true and %cond = false will be passed separately).

Then why can't you just always insert it on this block's terminator? Why iterate through its instructions and try to find this garbage?

Tried to implement this suggestion. Before the change, we reported select dependency as Def with guarantee that there are no clobbers between load and this Def. Now, we report special Select dependency (without any dependency instruction attached into it) and try to find non-clobbered loads from either predecessor's terminator (non-local case) or from the load position (local case).

mkazantsev accepted this revision.Feb 19 2023, 10:19 PM

All my objections seem resolved, thanks. I'm no expert in MemDep, so I don't know what implications come from introduction of new dependency type, so you might want to have someone else's opinion on this. Otherwise, LG.

llvm/lib/Transforms/Scalar/GVN.cpp
204

init with nullptr?

1136

I'd just split them, to separate two issues (no addr and addr of wrong type).

This revision is now accepted and ready to land.Feb 19 2023, 10:19 PM

Thank you for the review! I will wait some time in case of other reviewers' objections and then merge it.

This revision was landed with ongoing or failed builds.Feb 27 2023, 1:08 AM
This revision was automatically updated to reflect the committed changes.
kachkov98 reopened this revision.Mar 1 2023, 12:43 AM

Patch was reverted due to issue witch cached MemDep result. There is a code like this:

%arrayidx248 = getelementptr inbounds [0 x i8], ptr @zz_lengths, i64 0, i64 %idxprom247
%cond.in.in = select i1 %switch8091, ptr %orec_size, ptr %arrayidx248
%cond.in = load i8, ptr %cond.in.in, align 1, !tbaa !5

And on second iteration of processBlock() %switch8091 value is proved to be false, so select is removed and replaced with %arrayidx248, but we didn't invalidate already cached SelectDep entry of load from first GVN iteration - need to figure out how to add this logic into MemoryDependenceResults::removeInstruction method.

This revision is now accepted and ready to land.Mar 1 2023, 12:43 AM
kachkov98 planned changes to this revision.Mar 1 2023, 12:44 AM