- User Since
- Aug 10 2016, 1:07 PM (97 w, 4 d)
Fri, Jun 22
I could have used DenseMapInfo::getTombstoneKey() here instead of Optional, I think. But making remove() swap might still be worthwhile; it would give random-access iteration, and more flexibility with the keys (with a different underlying hashtable, we could allow keys which don't have tombstones).
We do linear lookups for NonOptPhis.erase() anyway
Thu, Jun 21
LGTM on the polly bits.
This is getting complicated enough that I'd like to see it split into two patches; one for the MachineOutliner option/configuration changes, and one to actually change the default for AArch64.
Also, LTO currently calls TargetLibraryInfo::disableAllFunctions to avoid this problem.
Maybe instead of trying to turn off optimizations on internal builtin functions, we should instead try to make optimizations not treat functions with local linkage as builtin? Not sure. This came up before on llvmdev, but we didn't really decide one way or the other.
Please fix lib/Analysis/BasicAliasAnalysis.cpp to allow null pointers to alias other objects. Please fix lib/Analysis/LoopAccessAnalysis.cpp to allow a loop to dereference null. Please fix isGEPKnownNonNull in lib/Analysis/ValueTracking.cpp to allow objects located at null. Please fix ConstantFoldScalarCall in lib/Analysis/ConstantFolding.cpp to allow objects located at null. Please fix Argument::hasNonNullAttr in lib/IR/Function.cpp to allow objects at null.
LGTM assuming you're planning to regression-test this somehow.
For the test, I mean something like the following:
Wed, Jun 20
It would probably make sense to change SelectionDAGLegalize::ExpandNode to use this lowering for SADDO/SSUBO, instead of making this target-specific. The target-independent version uses approximately the same operations anyway, just in a less efficient way.
See also https://bugs.llvm.org//show_bug.cgi?id=37211 .
I tried your suggestion, but without further tuning in vector lowering this does not yield much gain on a vector store operation.
Tue, Jun 19
Nothing should ask for the result of an inline asm which produces zero values; that's fine. (The chain is handled separately.)
Maybe the use list order isn't affected in this exact testcase. In general, the order ssa.copy intrinsics are inserted matters, though.
I wonder if we should prefer to widen <2 x i8> and <4 x i8> to <8 x i8> instead of promoting to <4 x i16>. It would make stores like this a bit cheaper. Maybe an interesting experiment at some point (mostly just modifying AArch64TargetLowering::getPreferredVectorAction, I think, and seeing what happens to the generated code).
It should be possible to implement this sort of coercion with an appropriate cast, rather than just bailing out, I think. But I won't block the patch on that.
Mon, Jun 18
I don't think donothing counts as a meta-instruction, by your definition; yes, it has no effects and no operands, but it isn't ignored by the optimizer (it generally just gets erased, but that's not the same as ignoring it).
The getExtLoad/getTruncStore API seems really confusing, but I guess there's no simple way to fix it.
Only if the fcmp is marked with nnan or ninf right?
In IR, if a pointer operand doesn't have the alignment given by the "align" annotation, the behavior is undefined. (If you don't explicitly specify the alignment, it defaults to the alignment specified in the DataLayout.) Similarly, in a MachineFunction, if a memory operation accesses an address which doesn't have the alignment given in the MMO, the behavior is undefined.
It's straightforward to define the alternative semantics where it only applies at the point of the call/load. And it would still be useful to the optimizer. But the optimizer code would have to be written from scratch; the existing getPointerDereferenceableBytes API isn't usable with an attribute like that. It's probably worth doing at some point, though: we could prove other interesting things with the context-sensitive analysis, though. For example, we could prove that a pointer is dereferenceable using a previous load or store operation.)
Also, I'd like to see some test coverage for the immediate patterns.
Did a bit more research into the GNU libatomic behavior. As far as I can tell, my initial assessment was correct: libatomic never uses unaligned atomic operations. However, it uses a trick which makes this a little complicated to test: it promotes small atomic operations to pointer size before performing the alignment check. So, for example, libatomic supports lock-free operations with size=3, depending on the input pointer. (On x86, it still only promotes up to pointer size, even though larger lock-free atomics are available; not sure why.)
Fri, Jun 15
Oh, wait, you mean in the "Reviewers" field; nevermind.
@klimek I think he just replied to the llvm-commits email.
Can you give an example of what the output looks like?
Actually, move the dereferenceable changes out of this patch; it should all go together, separately.
Any other comments on this? The !dereferenceable changes might be a bit controversial.
because values have slightly different labels
I went through those files but many of these addressSpace() == 0 (and != 0) checks are not for null pointers but for disabling/skipping some memory optimizations. And it is not clear to me if moving all of those checks to the new NullPointerIsDefined function makes sense there.
Thu, Jun 14
It has to be one of undefined behavior, poison, undef, or an unspecified value as described here; there is no other way for something to be "undefined" in LLVM IR, unless you're proposing to introduce a new form of undefined-ness specifically for nnan/ninf.
My first guess would be some sort of non-determinism.
I'm generally not a fan of having features like this, which have widespread implications, turned on only for certain target CPUs; it tends to make it much harder to find bugs, since the code gets little testing. But I guess this is okay for now.
Wed, Jun 13
HonorSignDependentRounding is pretty clearly a failed experiment; I'd suggest killing it completely (in a separate patch).
On targets with NEON, we should prefer NEON vectorization (vmlal etc.) over DSP instructions in almost all cases. Is this intended specifically for targets which don't have NEON?
Oh, sorry, I mixed up the two values. I meant that you should add a couple testcases to ensure the stored value is correct on overflow.
I'd like to see a couple of testcases ensuring the return value is correct on overflow (we had a problem with that for __builtin_mul_overflow in the past).
You can introduce a target-specific SelectionDAG node without adding a corresponding IR intrinsic. See ISD::FIRST_TARGET_MEMORY_OPCODE.
GVNHoist is generally pretty stable; my team has used it to build a lot of code without any issues. There are a few known bugs (https://bugs.llvm.org/show_bug.cgi?id=37791, https://bugs.llvm.org/show_bug.cgi?id=36635, https://bugs.llvm.org/show_bug.cgi?id=37445 ) that we probably need to fix before turning it on by default.
Tests look good, at first glance.
Tue, Jun 12
__clzsi2 is provided by libgcc/compiler-rt, so it should be generally available. I guess it might be possible to construct an environment where it isn't, but I'm not sure how.
Mon, Jun 11
We should probably use this functionality to allow outlining instructions which access/modify SP, in cases where we don't need to spill LR. But that's probably better to do as a followup; it could have some unexpected effects on the heuristics.
I'm not checking for optsize because I don't think it makes sense to inline even when optimizing for speed... although maybe that's not right. The current code for Thumb1 is something like the following (which is essentially computing popcount(nextpoweroftwo(x)-1)).
Clarify that the value produced isn't undef. (It's more like freeze(undef).)