This is an archive of the discontinued LLVM Phabricator instance.

[Introduction] Redundant load reduction with invariant intrinsics
Needs ReviewPublic

Authored by lvoufo on Oct 9 2015, 2:07 PM.

Details

Summary

Based on the design doc at
https://docs.google.com/document/d/112O-Q_XrbrU1I4P-oiLCN9u86Qg_BYBdcDsmh7Pn9Nw/edit?usp=sharing,
it handles non-member C++ const objects.
Extensions are discussed in the design doc and will be submitted separately.

Diff Detail

Event Timeline

lvoufo updated this revision to Diff 36989.Oct 9 2015, 2:07 PM
lvoufo retitled this revision from to [Introduction] Redundant load reduction with invariant intrinsics.
lvoufo updated this object.
lvoufo added a subscriber: llvm-commits.
lvoufo updated this revision to Diff 36992.Oct 9 2015, 2:21 PM

Clang-formet'ed.

reames added a subscriber: reames.Oct 9 2015, 2:48 PM

This definitely needs to be split up into a series of smaller patches to
be reviewable. I strongly suggest starting small, getting through one
cycle of review, then posting a couple more and recursing. Reading
through the diff, I see a lot of points which will need
discussion/clarification.

Also, a bit of background on what you're trying to accomplish is
required. A link to a private google doc is not sufficient.

Philip

nlewycky added inline comments.Oct 9 2015, 3:16 PM
include/llvm/IR/GlobalVariable.h
32–33

Alphabetize.

48

This is a change to the concept of what a global variable in LLVM IR is, and therefore would require a matching LangRef change (and bitcode reader/writer change).

If you need to store information about a GV for a pass, please create a map<GlobalVariable*, IntrinsicInst*> in your pass.

include/llvm/IR/Instructions.h
36–37

Alphabetize.

79

Again, this is a fundamental change to what an alloca instruction is.

Question for you, what if there's more than one invariant call on this alloca?

Note that this doesn't form a use, so what if someone deletes the invariant intrinsic? This is left hanging and pointing to nothing, or worse someone could reallocate a new different instruction at the same address.

lib/Analysis/BasicAliasAnalysis.cpp
500

What if the GV has an invariant start instruction, but the query is from before the GV initializer has finished running? A "false negative" in this if-expression fails unsafe, skipping this block will cause us to go into code which is written to assume GV->isConstant is true. If that were sufficient, then there's no need to have getInvariantStartInstruction at all, you could just set isConstant instead.

You need to check that the invariant has already executed, and the way to do that is to see that it dominates (which currently we can only do in a function-local sense). I don't know what cases you're solving this way, but my stab-in-the-dark guess is that you should probably be getting them with changes to MemoryDependenceAnalysis instead; if you see an invariant.end you can skip the scan of instructions up to the matching invariant.start which it conveniently takes as its first argument. That way, you never even get into the point of querying basicaa about the instructions in between, therefore they don't appear to be potentially conflicting for any user of memdep (GVN, DSE, Memcpyopt).

lib/IR/Instructions.cpp
1155–1158

What about II->getParent() == this->getParent()?

lib/Transforms/IPO/GlobalOpt.cpp
2247–2250

I don't think you need this?

GlobalOpt performs symbolic execution of each instruction in order. It will see, and "execute" an invariant start instruction by inserting it into the "Invariants" set above. This is a more accurate model, since before the execution occurs it is not yet constant, and it becomes constant afterwards.

I'm not sure what cases processInvariantIntrinsics is catching which the existing code is not, and I'd like to review the GlobalOpt changes independently.

2640–2641

This is what the Evaluator does (except not just for invariant intrinsics). This code is a second evaluator inside the evaluator, please don't do that, just integrate with the rest of GlobalOpt.

2646–2648
for (Function::iterator BB = F->begin(), BE = F->end(); BB != BE; ++BB) {
  // Function::iterator implicitly converts into a BasicBlock*, so "BB->..." works fine
}
2648

"BasicBlock* NextBB" --> "BasicBlock *NextBB".

2683

So the only things Eval.getVal() will return non-null for are Arguments and GlobalValues, because you've entered processInvariantIntrinsics() before evaluating the function. Good news, that's not wrong, it just doesn't have the values.

The alternative of running it afterwards would fill in the values, but be actively wrong in a case where you're inside a loop; a loop value incrementing %x from 0 to 5 would show as '5' even if it were used inside the loop.

lib/Transforms/InstCombine/InstructionCombining.cpp
1966

I don't think that's right. Proposed counterexample:

declare {}* @llvm.invariant.start(i64 %size, i8* nocapture %ptr)
declare void @llvm.invariant.end({}* %start, i64 %size, i8* nocapture %ptr)
declare i8* @malloc(i32)
define void @foo() {
  %p = call i8* @malloc(i32 1)
  %inv = call {}* @llvm.invariant.start(i64 -1, i8* %p)
  %dead = icmp eq i8* %p, i8* null
  call void @llvm.invariant.end({}* %inv, i64 -1, i8* %p)
  ret void
}

If "isAllocSiteRemovable" is true, we should just delete invariant starts and ends. There may be a pile of them, like:

%inv1 = call {}* @llvm.invariant.start(i64 -1, i8* %p)
call void @llvm.invariant.end({}* %inv1, i64 -1, i8* %p)
%inv2 = call {}* @llvm.invariant.start(i64 -1, i8* %p)
call void @llvm.invariant.end({}* %inv2, i64 -1, i8* %p)
%inv3 = call {}* @llvm.invariant.start(i64 -1, i8* %p)
call void @llvm.invariant.end({}* %inv3, i64 -1, i8* %p)

and we don't care since we already checked that it's safe to delete them all. Just nuke away!

Also, don't dyn_cast + assert the result, just use cast<>. It has the assert inside.

lvoufo added a comment.EditedOct 9 2015, 5:04 PM

This definitely needs to be split up into a series of smaller patches to
be reviewable. I strongly suggest starting small, getting through one
cycle of review, then posting a couple more and recursing. Reading
through the diff, I see a lot of points which will need
discussion/clarification.

Also, a bit of background on what you're trying to accomplish is
required. A link to a private google doc is not sufficient.

Philip

Hi Philip -- This *is* the initial patch. I can't think of another way to break it down more effectively.
Thanks for pointing out the inaccessibility of the design doc.
It is not meant to be private. If one is unable to access, I believe s/he should be able to request access, which I will promptly grant (?).
Alternatively, I have made a copy available from my personal (and more public-capable) email account.
Please let me know if you encounter any more issue.

lvoufo updated this object.Oct 9 2015, 5:17 PM
lvoufo updated this object.Oct 9 2015, 7:58 PM
lvoufo marked 4 inline comments as done.Oct 9 2015, 8:28 PM
lvoufo added inline comments.
include/llvm/IR/GlobalVariable.h
48

I have been troubleshooting how to do this better. Will see what I can come up with... Thanks for the pointer.

include/llvm/IR/Instructions.h
79

Same as above.

lib/IR/Instructions.cpp
1155–1158

Good point. Thanks!

lib/Transforms/IPO/GlobalOpt.cpp
2247–2250

True. I was using it as a temporary informational tool.

lvoufo marked 2 inline comments as done.Oct 10 2015, 8:55 AM
lvoufo added inline comments.
lib/IR/Instructions.cpp
4028–4033

TODO: Revise this description and simplify the function signature.

lib/Transforms/IPO/Inliner.cpp
485–486

TODO: Remove this change.

This definitely needs to be split up into a series of smaller patches to
be reviewable. I strongly suggest starting small, getting through one
cycle of review, then posting a couple more and recursing. Reading
through the diff, I see a lot of points which will need
discussion/clarification.

Also, a bit of background on what you're trying to accomplish is
required. A link to a private google doc is not sufficient.

Philip

Hi Philip -- This *is* the initial patch. I can't think of another way to break it down more effectively.

Er, no. This patch currently involves changes to 5 different transform passes and 3 analysis passes. This can and must be split into smaller changes. I'd suggest striping out everything not required for *one* of the transform passes and iterate on that.

Thanks for pointing out the inaccessibility of the design doc.
It is not meant to be private. If one is unable to access, I believe s/he should be able to request access, which I will promptly grant (?).
Alternatively, I have made a copy available from my personal (and more public-capable) email account.
Please let me know if you encounter any more issue.

Thank you for sharing the doc publicly. It did help to explain the problem you're trying to solve, but it didn't really give me any insight into how your trying to solve it. Can you sketch out an architectural level summary of how you intent this to work?

dberlin edited edge metadata.Oct 15 2015, 8:49 AM
dberlin added a subscriber: dberlin.

I'm with Philip on this one.
This patch looks like it can be split up.
As a rule, you don't have to make everything go at once, you can do it
piece by piece.

(IE it's fine to modify one transform/analysis pass, even if nothing
but tests use it yet, and then add the use in a separate patch).

More to the point, the feedback you are getting says: "If you want
this reviewed by reviewers, you should split it up more".
So unless you can point to concrete reasons it can't be split up, you
should do that.

I'm with Philip on this one.
This patch looks like it can be split up.
As a rule, you don't have to make everything go at once, you can do it
piece by piece.

(IE it's fine to modify one transform/analysis pass, even if nothing
but tests use it yet, and then add the use in a separate patch).

More to the point, the feedback you are getting says: "If you want
this reviewed by reviewers, you should split it up more".
So unless you can point to concrete reasons it can't be split up, you
should do that.

Okay. Thanks for the clarifications. I'll see what I can do.

lvoufo added inline comments.Oct 19 2015, 8:24 PM
lib/Transforms/InstCombine/InstructionCombining.cpp
1966

I'm confused by this comment. What is wrong about this? Note that I here represents one of the users collected by "isAllocSiteRemovable" and which are being iterated through...

nlewycky added inline comments.Oct 19 2015, 8:49 PM
lib/Transforms/InstCombine/InstructionCombining.cpp
1966

You're right that my examples don't work because I misunderstood what 'I' is. However, I think I can still break this assert:

%inv1 = call {}* @llvm.invariant.start(i64 -1, i8* %p)
br i1 undef, label %cond_true, label %cond_false

cond_true:

call void @llvm.invarient.end({}* %inv1, i64 -1, i8* %p)
unreachable

cond_false;

call void @llvm.invarient.end({}* %inv1, i64 -1, i8* %p)
unreachable

In this case 'I' is %inv1 (one of the users of %p) and it's safe to remove according to isAllocSiteRemovable, yet it has neither zero nor one uses.

lvoufo added inline comments.Oct 19 2015, 9:25 PM
lib/Transforms/InstCombine/InstructionCombining.cpp
1966

Ah. Now I understand this one. Thanks!

lvoufo marked 11 inline comments as done.Oct 19 2015, 11:15 PM
lvoufo added inline comments.
lib/Transforms/IPO/GlobalOpt.cpp
2640–2641

The real reason why this version of processInvariantIntrinsics() is necessary is that EvaluateBlock() could exit before processing the invariant intrinsic call (and marking global variables 'writeonce'), e.g., if a call to a function declaration that we cannot constant fold occurs before the intrinsic call.

This processInvariantIntrinsics() complements Evaluator::EvaluateBlock() by pre-traversing the call-stack tree on the same Evaluator instance. The alternative would be to delay exits from Evaluator::EvaluateBlock() while remembering whether to exit with 'false' or 'true'; which would be a messier extension than separating the functions.

I'm rewriting this function with respect to other comments below anyway...

lvoufo marked 5 inline comments as done.Oct 20 2015, 6:37 PM

I think I've addressed all the comment by now. I'll be pushing in split-up patches (with corrections) next.

lib/Analysis/BasicAliasAnalysis.cpp
500

GVs, like AIs, are marked "writeonce written" (via 'setInvariantStartInstruction') immediately after initialization.
This test parallels a similar test above on AIs. That is, it should only succeed for non-"writeonce written" and non-constants.
Skipping instructions in MemoryDependenceAnalysis does not quite work in more general cases, such as where either the instantiated object itself is not const, but has const fields.
IIRC, I also ran into situations where GEPs where not eliminated as they should with their subesequent eliminated loads...
For the cases where this does work, I would rather leave it in a separate "improvement" patch.

lvoufo marked an inline comment as done.EditedOct 22 2015, 9:35 PM

I have addressed all the feedback so far (Thanks, Nick!) and split the updated version of this patch into 6-7 patches: D14003 (D14008 + D14009), D14004, D14005, D14006 and D14007.
Is this clearer? Would it help to split it up further?

Thanks,
--L.

lvoufo added inline comments.Dec 4 2015, 5:52 PM
lib/IR/Instructions.cpp
1155–1158

Actually, I'm not sure that this works. For example, take an alloca outside of a branch that contains an invariant_start...

nlewycky added inline comments.Dec 8 2015, 9:47 PM
lib/IR/Instructions.cpp
1155–1158

Sorry, you're right. To check that they're the same function, it's "II->getParent()->getParent() == getParent()->getParent()".

asb added a subscriber: asb.Dec 8 2015, 10:59 PM
asb added inline comments.
lib/IR/Instructions.cpp
1155–1158

Actually rL254975 added Instruction::getFunction

lvoufo marked 3 inline comments as done.Dec 9 2015, 3:20 AM
lvoufo added inline comments.
lib/IR/Instructions.cpp
1155–1158

Thanks.