Add to safeCxtI() a default context for function arguments which accepts assumptions defined in the entry block that are valid anywhere in the function.
Diff Detail
Event Timeline
Am I understanding correctly that this tries to use the last instruction in the entry block rather than the first one to avoid triggering the ephemeral value check, in case the first instruction is part of an assumption?
In any case, I don't think it's appropriate to perform a full block scan to determine the context instruction. safeCxtI() should be cheap (as in O(1)).
Sorry for the delay @nikic.
Yes, the first instruction is fine except for an assume which appears (along with its ephemeral values) as the first thing in the function, which in case of an argument seems quite likely.
In any case, I don't think it's appropriate to perform a full block scan to determine the context instruction. safeCxtI() should be cheap (as in O(1)).
An unbounded scan of the entry block might indeed be too much even if applied only to arguments. The scan can perhaps be limited to some small number of instructions (5 seems like the minimum to cover most patterns in computeKnownBits()) which gets reset if an assume is encountered.
Another possibility would be to leave the context instruction at nullptr, and instead adjust isValidAssumeForContext to accept a nullptr CxtI in which case only instructions that are must-exec from the function entry are considered. Advantage is that it only incurs a cost if there is a potentially relevant assume.
Right. Setting a safe context seemed less intrusive than letting isValidAssumeForContext take a nullptr CxtI, which requires changes in some of the callers. But on a closer look it seems it's only simplifyICmpWithDominatingAssume(), computeKnownBitsFromAssume() and computeConstantRange() that might get a nullptr context, and the modifications there seem relatively minor. Another option might be to find a safe context in those callers for the first assume, caching and updating it for the other assumptions as needed. Will give it a try.
FWIW, I agree with @nikic, we should not put this logic here. There are two problems:
- We compute something we might not need.
- We do it only in value tracking.
I'd prefer the following, though the nullptr proposal seems fine too:
a) For arguments use the first instruction in the entry as context, this is trivial and correct.
b) When the context is used, e.g., to look for assumption, allow to some exploration of surrounding instructions.
Thanks for taking a look, Johannes!
I'd prefer the following, though the nullptr proposal seems fine too:
a) For arguments use the first instruction in the entry as context, this is trivial and correct.
Agreed. Will limit this patch for this low-hanging fruit. Handling null contexts in isValidAssumptionForContext() is indeed more general, but also seems to have greater potential for causing trivial simplification of ephemeral values. If it works out it would replace the safe context for arguments.
b) When the context is used, e.g., to look for assumption, allow to some exploration of surrounding instructions.
Not sure I see how (beyond the existing extension to reachable instructions). Since isValidAssumeForContext() can't tell why its context was chosen it must assume the context might be protecting an ephemeral from simplification, right?
The idea is that you can always do what you do here in isValidAssumeForContext (and friends). I just checked and we seem to do so already. Could you explain to me why we need to go for the Last instruction in this patch at all? What would happen if you simply pick the first in the entry block, which is trivially correct. (Note: You can skip llvm.assume to make some weird problems go away).
The idea is that you can always do what you do here in isValidAssumeForContext (and friends). I just checked and we seem to do so already. Could you explain to me why we need to go for the Last instruction in this patch at all? What would happen if you simply pick the first in the entry block, which is trivially correct. (Note: You can skip llvm.assume to make some weird problems go away).
The patch indeed defaults to the first instruction in the entry. Scanning to the end at safeCxtI() was an optimization for cases where the first instruction is an ephemeral value of an assume (which seems quite likely for assumes about arguments) that would get the assume discarded by the isEphemeralValueOf(Inv, CxtI) check. At isValidAssumeForContext() we can't distinguish between a context given only as a control-flow marker and a context that also guards against simplifying an ephemeral so we can't try to improve it there, but since any context safeCxtI() provides for a null CxtI is just a control-flow marker anyway we might as well choose one that's not an ephemeral of any assume in the entry block. Does that make sense?
Yes. But your code does "more" than that. All you want is this pseudo-code, right?
auto &It = EntryBlock.begin(); while (isaAssumeIntrinsic(*It)) ++It; return *It;
Yes, if isaAssumeIntrinsic stands for "is an assume or an ephemeral of an assume". I suggested something like that in a previous comment:
An unbounded scan of the entry block might indeed be too much even if applied only to arguments. The scan can perhaps be limited to some small number of instructions (5 seems like the minimum to cover most patterns in computeKnownBits()) which gets reset if an assume is encountered.
(which could still of course skip over perfectly good instructions and choose an ephemeral as context if the assumes were not written right at function entry)
No problem, a chance to verify my understanding:
So a function such as
int f(int num) { __builtin_assume((num & 3) == 0); return num +1; }
ompiles under -O3 -g0 -S -emit-llvm into:
define dso_local i32 @_Z1fi(i32 %0) local_unnamed_addr #0 { %2 = and i32 %0, 3 %3 = icmp eq i32 %2, 0 tail call void @llvm.assume(i1 %3) %4 = or i32 %0, 1 ret i32 %4 }
Computing the known bits of %0 with %2 as the context would have isValidAssumeForContext() return false on the assume since %2 is one of its ephemeral values. Otherwise, simplifying %2 by computing %0's known bits would be logical bootstrapping - %0's assumed zero bits would be used to simplify %2 to zero, then %3 to true and the assume would be lost.
Oh.. this is bad. That happens if you mix two logically differnet things into a single instruction pointer. I put it on the list of things that need to fixed wrt. assumes ... :(
Thanks for the patience. I'm fine with the nullptr solution for now. I need to think more about this "workaround" before I'd suggest an alternative.
Excellent. Is the list published anywhere?
Thanks for the patience. I'm fine with the nullptr solution for now. I need to think more about this "workaround" before I'd suggest an alternative.
Sure, thanks for taking the time!
No, but maybe I should do that. Here is what I remember right now:
- Distinguish between the context location for which an assumption is queried and the instructions for which it is queried. Basically what is going "wrong" here. I assume tracking what assumptions have been used is the most efficient and if some were used and their operands include the instruction to be optimized, don't. Alternatively, pass the instruction and avoid dependent assumes. Other options are sensible as well.
- We did 1) from https://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html but not 2) and 3) yet. 2) is needed to lower assert in release mode and arbitrary assumes to IR. 3) is needed, among other things, for pragma omp assumes and pragma omp assume, see https://clang.llvm.org/docs/AttributeReference.html#assume and https://www.openmp.org/spec-html/5.1/openmpsu37.html#x56-560002.5.2
- This was just a WIP, needs verification and test changes: https://reviews.llvm.org/D89054 . This will most likely flush out a few bugs in our non-speculatable call handling.
- Rewriting more "boolean" assumption to high-level assume bundles. We could for example do pointer comparisons with "eq/neq"(%ptr1, %ptr2) and then teach the capture checker that those uses do not capture.
- Continue to preserve knowledge whenever we modify the IR. There were some large regressions before, I hope D89054 will take care of some of that.
If you are interested in any of this, or something related, please let me know.
clang-format: please reformat the code
5 diff lines are omitted. See full path.