Fri, Jul 12
Thu, Jul 11
Wed, Jul 10
How aggressive is LLVM's UB handling? Would it remove an entire block/function if UB is found in it?
Wed, Jul 3
Or actually, it might make more sense to change the way we generate/lower callbr, to make the label parameters implicit; instead of modeling them with blockaddress, we never write them in IR at all, and automatically generate them later based on the successor list.
The question is whether something like the following is legal:
If they're all syntactically together like this, maybe that's safe?
Do the changes to BuiltinsARM.def have any practical effect? long should be 32 bits on all 32-bit ARM targets.
If there's some rule that distinguishes blockaddresses used in callbr from general blockaddresses, we should state that explicitly somewhere in LangRef.
Is this transform safe? The inline asm could stash the address of a destination in a variable in one loop iteration, and use it in a later loop iteration. Or is that not legal?
I wrote a very similar patch at one point, but I didn't submit it because I couldn't demonstrate any significant benefit. Then again, I only tested it with Thumb1; maybe it's more useful for Thumb2?
The line 29 b .LBB0_10 is created after running the pass branch-folder,
I don't think this transform is valid, for the same reasons we don't do it in IR optimizations.
Tue, Jul 2
Fix variable name.
I'm a little wary about fold the case where doesNotAccessMemory is false, but I guess it's likely okay.
Is this documented in LangRef somewhere? I'm not seeing anything that indicates the blockaddress has to be an operand of the callbr, and I'm not sure why it would be necessary.
Mon, Jul 1
This seems like simple tail duplication, which the target-independent taildup pass should handle. Can you give an example which taildup doesn't handle?
Yes, we should probably teach DAGCombine to choose the right form for each target/type.
Fri, Jun 28
Maybe worth explicitly mentioning in the commit message that AggressiveDeadCodeElimination::updateDeadRegions loops over BlocksWithDeadTerminators.
Since this is an extension, it wouldb be great to have a (on-by-default? at least in -Wall) diagnostic for every instance of usage of this extension.
I agree the change to the API makes sense.
Thu, Jun 27
The new approach to tracking expressions inside of __builtin_preserve_access_index seems okay.
Machine verifier would fail if only RDDSP was used without some other instruction previously defining a register
If we say the result of the constrained intrinsic is poison, will the fact that the intrinsic is defined as having side effects keep it from being eliminated?
Wed, Jun 26
I think IEEE-754 does define this
there is a possibility of a miscompile when threading an edge across the LH
Is this a general comment or referring to a change in Kevin's patch?
Maybe mention in the commit message that this is explicitly not addressing target-specific code, or calls to memcpy?
The problem with poison is that it eventually leads to UB, and then your program has no defined meaning. Practically, it might mean some codepath that involves a call to llvm.experimental.constrained.fptosi.i32.f64 could get folded away because it's provably UB, or something like that.
It looks like the arm64-neon-2velem.ll regressions are a shuffle lowering issue, yes; we're creating a DUPLANE where the operand is an extract_subvector, and it doesn't simplify.
Tue, Jun 25
What happens if the input float is out of range? fptosi/fptoui instructions produce poison; not sure if you want that here.
Fix Thumb1FrameLowering::emitPrologue and Thumb1FrameLowering::emitEpilogue so they don't scavenge a register inside of frame setup/teardown. Instead, just pick a register we know is available.
The third instruction in the function looks like it is using r6 as the base pointer to save one of the argument registers on the stack, but r6 isn't set until further down.
The problem here is that the IR semantics don't allow this transform in general.
Sorry, I meant to reply to this earlier. If you've tested the performance on PowerPC, it's probably fine. ARM has fewer registers, but not so few that it's likely to cause problems here.
Mon, Jun 24
Improved comment on ExtraCSSpill.
This looks a lot like a bug I hit a while ago while writing a fuzzer for ABI
compatibility between clang and gcc. I've currently got Thumb1-only targets
disabled because of this.
Fri, Jun 21
It looks like you didn't change all the uses of isFast() in optimizePow?
It would be nice to use the exact necessary fast-math flags here, while we're thinking about it, instead of just "isFast()". From the discussion, it seems like we only need "afn"?
I believe it's the default strategy for O0 AArch64 codegen these days
Thu, Jun 20
Tests committed in r363991 (https://reviews.llvm.org/rL363991)
I think this is sound now.
The testcase I actually wanted, which incorrectly forms a tail call:
Wed, Jun 19
Should clang translate that call into "sqrtf" to be more accurate?
In AVR, it uses push instruction to store arguments. Therefore. the order of store(push) instruction can't be changed.
So we would need to mark every instruction that uses any part of DSPControl as having an post-isel hook and there are more than 100 instructions
How do we test that?
Tue, Jun 18
I'm just laying out the basic requirements for getting this patch back in, because the current patch is invalid given LLVM's current requirements.
If we're going to insert emms instructions automatically, it doesn't really make sense to do it in the frontend; the backend could figure out the most efficient placement itself. (See lib/Target/X86/X86VZeroUpper.cpp, which implements similar logic for AVX.) The part I'd be worried about is the potential performance hit from calling emms in places where other compilers wouldn't, for code using MMX intrinsics.
AVR intentionally makes the C type "double" a single-precision float ("float" in IR). So we could fail to recognize "fabs" etc. on AVR with this patch, I think?
Are you saying that using MMX in LLVM requires source-level workarounds in some way, and so we can't lower portable code to use MMX because that code will (reasonably) lack those workarounds?
Now, we could theoretically use a different ABI rule for vectors defined with Clang-specific extensions, but that seems like it would cause quite a few problems of its own.
Mon, Jun 17
Switched to a single vector for mapping symbols. Switched isArmElf() to use getEMachine(). Updated title/commit message.
Jun 13 2019
Also, it looks like you never cc'ed llvm-commits; please abandon this and post a new revision with llvm-commits properly cc'ed
Please fix the "summary" to include the full expected commit message.
Is the isLoopEntryGuardedByCond actually proving what you need it to prove? Even if Start-Stride is in the range [0, End), that doesn't necessarily imply Start-Stride doesn't overflow. For example, suppose Start is 0, End is -1, and Stride is 2.