I have almost this exact change in a WIP patch, but I didn't get around to posting it because I didn't have a testcase, and I ended up deciding to handle the Thumb1 estimation differently.
Fri, Sep 13
I guess in theory it's possible, for example, for a load to be ModRefInfo::NoModRef because it loads from a readonly global, and to write a pass that depends on the assumption that we won't create a MemoryUse for that load. (I think we actually create a MemoryUse for that at the moment, but it could change.)
Why does an instruction which doesn't read or write memory have an associated MemorySSA memory access? Do we assume everything accesses memory if basicaa is disabled? Would it make sense to fix that, instead of adding checks which always fail normally?
Thu, Sep 12
Wed, Sep 11
Minor fix for ARMTargetParser: vfp2sp should be FPURestriction::SP_D16. I don't think this has any practical effect beyond making clang emit the "+vfp2sp" attribute in all the cases where it's legal.
Tue, Sep 10
It looks like the change is essentially a no-op for constant hoisting itself? Or maybe I'm misreading the code somehow? But sure, I'm fine with generally hoisting the check out of canReplaceOperandWithVariable, for all three of the current callers.
I don't really have any concerns about this from the perspective of correctness; any breakage will be obvious and easy to fix.
Note: I have no idea why git has decided that I've made a change to an MC test.
This can help reduce register pressure.
Mon, Sep 9
This makes more sense than D67076, I think.
Maybe not worth trying harder. There are other reasons we might not want to introduce constant pools that require relocations.
In some cases it might be possible to form a vector constant pool entry. We don't do that currently, though.
Fri, Sep 6
My motivation for this change is to fix a case where we end up with a non-deterministic use-list order after simplifycfg, which I think is caused by this trashing of the use-lists.
Do you mean in general or in for SCEV?
I think a few of the functions we use ConstantFoldFP with actually already have implementations in APFloat: floor, ceil, round, fabs, fmod.
That looks mostly fine, then. Maybe still a few more cmovs than I'd like, still... but close enough.
I'm not really happy with adding more uses of ConstantFoldFP/ConstantFoldBinaryFP; they're flawed because they produce results that depend on the host's libm implementation. But I guess there isn't any reason to support log and not log2.
I'm trying to understand the issue you're seeing... I guess it comes down to something like the following?
Thu, Sep 5
Got rid of a few changes that were clearly unnecessary, because we never iterated over the maps in question.
I've also blocked INT_MIN as the transform isn't valid for that.
In your test case, we hit the early return that I linked to, so we don't try to clone, and we don't need to emit an error.
Oops, meant to actually include the testcase in my last comment:
In the MS ABI, deriving a new class may require the creation of new thunks for methods that were not overridden, so we can't use the same trick.
Basically, the only rule is that you should not speculatively introduce a conditional branch on value that might be undef and is not guaranteed to execute in the input IR.
You description of the issue uses "||" in the pseudo-code; I assume you actually mean a non-short-circuiting "|"?
Is there a potential correctness issue here, if some operation that isn't the exit condition or the IV increment somehow ends up in the new latch?
It seems like there are multiple problems here if "BB" gets deleted eagerly: there the AssertingVH failure you're mentioning here, LVI->eraseBlock doesn't work, and the for loop "for (auto &BB : F)" also breaks. So I don't think this patch really solves anything.
Wed, Sep 4
In terms of general API, I don't think we want to expose "applyLoopGuards"; the SCEV transform proposed here isn't really useful outside of trying to find the minimum or maximum, as far as I can tell. Which min/max expressions we want to form depends on whether we're computing a "max" or a "min". And restricting the API so the point in the CFG we're querying has to be a loop header doesn't seem helpful; other places might care about values after a loop etc.
Patch uploaded without context.
So, what metric specifically do you want to see, a count of CMOV instructions at the end of codegen, how it is changed by this patch?
I'm not sure how you never hit this issue before. I mean, I can see from the testcase that the compare for the outer loop's branch can end up in the inner loop... but why is it not *always* in the inner loop? What criteria are we using to sink it in some cases? Should we be sinking in all cases?
New changes look okay... but maybe someone else should look too, since I've missed multiple serious issues with the CFG updating code.
Aggressively flattening the CFG has tradeoffs. If the branch is very unpredictable, or it unblocks some important optimization, it can have a huge benefit. If you don't fall into one of those cases, you're mildly degrading the performance of a bunch of code, by forcing the execution of instructions where the result isn't used.
Instead of writing a C++ unittest, you should be able to use "opt -analyze -scalar-evolution" to test this.
Tue, Sep 3
I'm afraid there's a latent bug in the interaction between ShrinkWrap and PEI, but I guess the effect might be sort of hard to spot; if we allocate an unnecessary stack object before PEI, it would be hard to notice.
This seems much better.
Fri, Aug 30
Do we have test coverage for a variadic, covariant thunk for a function without a definition? I don't think there's any way for us to actually emit that, but we should make sure the error message is right.
This method seems to be called by several passes:
Could you describe the complete flow here? If getAccumulator() returns a value, I think I see how it works; the type of that value is the type of the final accumulator, and you need to sign-extend multiplies to match that type. If getAccumulator() returns null, I don't see how this is supposed to work; it looks like the code arbitrarily decides the accumulator should be 32 bits.
Thu, Aug 29
In the context of __builtin_frame_address, an arbitrary limit is probably okay. Maybe something like 0xFFFF, which is larger than anyone would realistically use, but doesn't take a crazy amount of time to compile.
It's probably worth adding testcases for 0.5 and -0.5. I think the current implementation behaves correctly, but it would be easy to mess up with a small change to the code.
Why are we calling determineCalleeSaves in LiveDebugValues, anyway? Can't it just call getCalleeSavedInfo()?
Tue, Aug 27
We usually prefer to generate error messages for incorrect parameters to builtins in SemaChecking.cpp.
Mon, Aug 26
Can we remove the CanBeNull argument from getPointerDereferenceableBytes()? It looks like it's currently unused. Or are you planning to use it somewhere?
Fri, Aug 23
Is there an llvm-dev thread for the general project? It looks fine to me, but maybe it should have a wider audience.
On a side-note, we really should try to come up with a better algorithm for forming pre/post-increment operations; hasPredecessorHelper is slow.
If you want to really expand out the meaning of "CanBeNull", it means "was the number of dereferenceable bytes computed using a dereferenceable_or_null attribute/metadata". It has nothing to do with whether a null pointer is generally valid in the given address space. The logic has always worked this way, since before it was extracted into a separate function in D17572.
getPointerDereferenceableBytes returns some number of dereferenceable bytes. If CanBeNull is true, that result is modified: if the pointer value is null, the number of known dereferenceable bytes is actually zero.
Thu, Aug 22
Thinking about it a bit more, that definition is probably okay.
As I described, the compare will capture only if the other pointer was captured. So additionally to the checks in the patch ask the tracker if that is the case
Oh, I somehow forgot that was legal. :( That breaks this whole approach (well, maybe the lambda could capture "w", but that seems way too complicated). So we're left with a few possibilities:
Wed, Aug 21
Added a few more minor comments.
(a) have two MCInstrAnalysis (how), (b) pass an extra parameter to evaluateBranch, or (c) make it stateful?
Yes, you can't capture a pointer that's already captured.
These are redundant because they're new nodes? Sure, LGTM.
Tue, Aug 20
Why are the changes to CloneModule.cpp necessary?
Changing this lead very quickly to a world of pain
I'd rather analyze whether the memory is valid separately from whether the pointer itself is captured, I think? Removing nocapture markings makes all the existing uses of capture analysis weaker.
Is there some practical issue I'm missing?